From 8d955ed717558e5a47c95e2b01aaeb474ab2e984 Mon Sep 17 00:00:00 2001 From: majorli Date: Tue, 13 Aug 2024 10:56:52 +0800 Subject: [PATCH 1/2] add llama3 8b megatron deepspeed model Signed-off-by: majorli --- .../llama3-8b/megatron-deepspeed/README.md | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 nlp/llm/llama3-8b/megatron-deepspeed/README.md diff --git a/nlp/llm/llama3-8b/megatron-deepspeed/README.md b/nlp/llm/llama3-8b/megatron-deepspeed/README.md new file mode 100644 index 000000000..f530a243f --- /dev/null +++ b/nlp/llm/llama3-8b/megatron-deepspeed/README.md @@ -0,0 +1,45 @@ +# Llama3-8B (Megatron-DeepSpeed) + +## Model description + +Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. Llama 3 uses a tokenizer with a vocabulary of 128K tokens, and was trained on on sequences of 8,192 tokens. Grouped-Query Attention (GQA) is used for all models to improve inference efficiency. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. + +## Step 1: Installation + +```bash +# Clone +git clone https://gitee.com/deep-spark/Megatron-DeepSpeed.git +cd Megatron-DeepSpeed/ +# Install +bash build_megatron-deepspeed.sh && bash install_megatron-deepspeed.sh +pip3 install urllib3==1.23 +``` + +## Step 2: Preparing datasets + +```bash +pushd dataset +# get gpt_small_117M_llama3.tar +wget http://files.deepspark.org.cn:880/deepspark/gpt_small_117M_llama3.tar +tar -xf gpt_small_117M_llama3.tar +rm -f gpt_small_117M_llama3.tar +popd +``` + +## Step 3: Training + +```bash +export NCCL_SOCKET_IFNAME="eth0" +cd examples/llama3 +bash run_te_llama3_8b_node1.sh +``` + +## Results + +| GPUs | Model | Training speed | +| :-----: | :----------------------------: | :------------: | +| BI-V150 | Llama3-8B (Megatron-DeepSpeed) | | + +## Reference + +- [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed) -- Gitee From 4c98baa870438bec98a9b540010da5e914a45e7b Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Mon, 2 Sep 2024 15:33:41 +0800 Subject: [PATCH 2/2] add llama3-8b megatron-deepspeed to model list Signed-off-by: mingjiang.li --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d2118ca6a..987cf4f59 100644 --- a/README.md +++ b/README.md @@ -447,6 +447,7 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 [Llama2-7B SFT](nlp/llm/llama2-7b_sft/megatron-deepspeed/README.md) | PyTorch (Megatron-DeepSpeed) | gpt_small-117M [Llama2-13B](nlp/llm/llama2-13b/megatron-deepspeed/README.md) | PyTorch (Megatron-DeepSpeed) | Bookcorpus [Llama2-34B](nlp/llm/llama2-34b/megatron-deepspeed/README.md) | PyTorch (Megatron-DeepSpeed) | Bookcorpus +[Llama3-8B](nlp/llm/llama3-8b/megatron-deepspeed/README.md) | PyTorch (Megatron-DeepSpeed) | Bookcorpus [QWen-7B](nlp/llm/qwen-7b/firefly/README.md) | PyTorch (Firefly) | qwen-7b [QWen1.5-7B](nlp/llm/qwen1.5-7b/firefly/README.md) | PyTorch (Firefly) | school_math [QWen1.5-14B](nlp/llm/qwen1.5-14b/firefly/README.md) | PyTorch (Firefly) | school_math -- Gitee