From 8d07e27a585457e0d7a13e565560be80da1917f4 Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Mon, 24 Mar 2025 14:33:25 +0800 Subject: [PATCH 1/2] unify readme format of rl models --- .../dqn/paddlepaddle/README.md | 34 ++++++++++--------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/reinforcement_learning/q-learning-networks/dqn/paddlepaddle/README.md b/reinforcement_learning/q-learning-networks/dqn/paddlepaddle/README.md index d9423ebcd..f8b45af44 100644 --- a/reinforcement_learning/q-learning-networks/dqn/paddlepaddle/README.md +++ b/reinforcement_learning/q-learning-networks/dqn/paddlepaddle/README.md @@ -1,11 +1,16 @@ # DQN -## Model description +## Model Description -The classic DQN algorithm in reinforcement learning is a value-based rather than a policy-based method. DQN does not -learn a policy, but a critic. Critic does not directly take action, but evaluates the quality of the action. +DQN (Deep Q-Network) is a foundational reinforcement learning algorithm that combines Q-Learning with deep neural +networks. As a value-based method, it uses a critic network to estimate action quality in high-dimensional state spaces. +DQN introduces experience replay and target network stabilization to enable stable training. This approach +revolutionized AI capabilities in complex environments, achieving human-level performance in Atari games and forming the +basis for advanced decision-making systems in robotics and game AI. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash git clone https://github.com/PaddlePaddle/PARL.git @@ -15,29 +20,26 @@ pip3 install matplotlib pip3 install urllib3==1.26.6 ``` -## Step 2: Training +## Model Training ```bash -# 1 GPU +# 1 GPU Training python3 train.py -``` -## Step 3: Evaluating - -```bash +# Evaluation mv ../../../evaluate.py ./ python3 evaluate.py ``` -## Result +## Model Results -Performance of DQN playing CartPole-v0 +Performance of DQN playing CartPole-v0. -| GPUs | Reward | -|---------|--------| -| BI-V100 | 200.0 | +| Model | GPU | Reward | +|-------|---------|--------| +| DQN | BI-V100 | 200.0 | ## Reference - [PARL](https://github.com/PaddlePaddle/PARL) -- [paper](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) +- [Paper](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) -- Gitee From c05184125bc8d1292274b0129fbabd9d5ec1ce58 Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Mon, 24 Mar 2025 14:33:38 +0800 Subject: [PATCH 2/2] update 25.03 release notes --- RELEASE.md | 43 +++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/RELEASE.md b/RELEASE.md index a05197ced..3e205d319 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -2,43 +2,43 @@ ## DeepSparkHub 25.03 Release Notes -### 特性和增强 +### 模型与算法 -#### 模型与算法 -● 新增了9个大模型训练示例,涉及MoE-LLaVA,DeepSpeed和LLaMA-Factory工具箱 +* 新增了9个大模型训练示例,涉及DeepSpeed,MoE-LLaVA和LLaMA-Factory工具箱 - - - + + + - - - + + + - - - + + +
大模型
MoE-LLaVA-Phi2-2.7B(MoE-LLaVA)MoE-LLaVA-Qwen-1.8B(MoE-LLaVA)MoE-LLaVA-StableLM-1.6B(MoE-LLaVA)GLM-4MiniCPM(DeepSpeed)Phi-3
Yi_6B(DeepSpeed)Yi-1.5_6B(DeepSpeed)Yi-VL-6B(LLaMA-Factory)MoE-LLaVA-Phi2-2.7BMoE-LLaVA-Qwen-1.8BMoE-LLaVA-StableLM-1.6B
GLM-4MiniCPM(DeepSpeed)Phi-3Yi-6B (DeepSpeed)Yi-1.5-6B (DeepSpeed)Yi-VL-6B (LLaMA-Factory)
-● 更新了cv/multi_object_tracking、cv/gnn、cv/face_recognition等分类名称。 -● 调整了kan、graph wavenet、hashnerf等模型的分类路径。 -● 删除了convnext、co-detr、centernet等模型的冗余代码与社区版本对齐。 -● 更新了相关模型README说明,增加了模型所支持的IXUCA SDK版本。 -● 更新了ATSS、Cascade R-CNN、CornerNet等模型代码,适配了MMDetection社区v3.3.0版本。 -● 增加了cv/classification、cv/detection自动化ci脚本。 -● 同步了tacotron2模型的代码。 +### 问题修复 + +* 同步了Tacotron2 PyTorch模型的最新代码。 +* 删除了ConvNeXt,Co-DETR和CenterNet等模型的冗余代码,并对齐社区版本。 +* 更新了MMDetection工具箱版本至v3.3.0,并同步ATSS、Cascade R-CNN、CornerNet等模型代码。 +* 增加了cv/classification和cv/detection的自动化CI脚本。 +* 更新了所有模型README文档格式,补充了模型所支持的IXUCA SDK版本。 + +### 版本关联 -#### 版本关联 DeepSparkHub 25.03对应天数软件栈4.2.0版本。 -#### 贡献者 +### 贡献者 感谢以下社区贡献者 @@ -46,7 +46,6 @@ DeepSparkHub 25.03对应天数软件栈4.2.0版本。 欢迎以任何形式为DeepSparkHub项目贡献。 - ## DeepSparkHub 24.12 Release Notes ### 特性和增强 -- Gitee