diff --git a/README.md b/README.md index b6383fb1f81f79fce9450646d23d6717a25dd17c..58a4fe20d9df7d7a6c6b9e9343232e37baac6c20 100644 --- a/README.md +++ b/README.md @@ -396,6 +396,7 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 [CLIP](multimodal/Language-Image_Pre-Training/clip/pytorch/README.md) | PyTorch | CIFAR100 [ControlNet](multimodal/diffusion/ControlNet/README.md) | PyTorch | Fill50K [DDPM](multimodal/diffusion/ddpm/README.md) | PyTorch | CIFAR-10 +[LLaVA](multimodal/llava/pytorch/README.md) | PyTorch | LLaVA-Pretrain [L-Verse](multimodal/Language-Image_Pre-Training/L-Verse/pytorch/README.md) | PyTorch | ImageNet [Stable Diffusion 1.4](multimodal/diffusion/stable-diffusion/training/README.md) | PyTorch | pokemon-images [Stable Diffusion 1.5](multimodal/diffusion/stable-diffusion/sd_1.5/README.md) | PyTorch | pokemon-images diff --git a/multimodal/llava/pytorch/README.md b/multimodal/llava/pytorch/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9659eb432bac133a22d8a112e671218a47987096 --- /dev/null +++ b/multimodal/llava/pytorch/README.md @@ -0,0 +1,55 @@ +# LLaVA 1.5 + +## Model description + +LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal +instruction-following data. It is an auto-regressive language model, based on the transformer +architecture. + +## Step 1: Preparation + +```bash +git clone --depth 1 https://github.com/haotian-liu/LLaVA +cd LLaVA/ +git checkout c121f0432da27facab705978f83c4ada465e46fd +mkdir -p checkpoints/ +mkdir -p data/ +``` + +### Weights + +Download the [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) and put it +at `checkpoints/`. + +Download the [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) +and put it at `checkpoints/`. + +### Datasets + +Download the [liuhaotian/LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) +and put it at `data/`. + +## Step 2: Installation + +```bash + +pip3 install -e . +pip3 install -e ".[train]" +pip3 install protobuf +``` + +## Step 3: Training + +```bash +bash train.sh +``` + +## Results + +| Model | GPUs | seconds per iteration | +| ------------ | ------- | --------------------- | +| LLaVA 1.5 7B | BI-V150 | 8.01s | + +## Reference + +- [LLaVA](https://github.com/haotian-liu/LLaVA) \ No newline at end of file