# MyTinyVLA **Repository Path**: A_lattice/my-tiny-vla ## Basic Information - **Project Name**: MyTinyVLA - **Description**: 本项目是对 TinyVLA 的改进,旨在解决 TinyVLA 源代码部署与测试环节存在若干缺陷,本项目新增了更为细化的源代码部署、模型下载及数据集自定义操作流程。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2025-12-29 - **Last Updated**: 2025-12-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

* **There are some issues with the source code deployment and testing of TinyVLA. This project has added more detailed procedures for source code deployment, model downloading, and dataset customization**
[![arXiv](https://img.shields.io/badge/Arxiv-2402.03766-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.12514) ## 📰 News * **`Feb. 17th, 2025`**: 🔥🔥🔥Our code is released! * **`Feb. 9th, 2025`**: 🔥🔥🔥**TinyVLA** is accepted by IEEE Robotics and Automation Letters (RA-L) 2025! * **`Nov. 19th, 2024`**: **TinyVLA** is out! **Paper** can be found [here](https://arxiv.org/abs/2409.12514). The **project web** can be found [here](https://tiny-vla.github.io/). ## Contents - [📰 News](#-news) - [Contents](#contents) - [Install](#install) - [Data Preparation](#data-preparation) - [Download Pretrained VLM](#download-pretrained-vlm) - [Train](#train) - [Evaluation](#evaluation) - [Acknowledgement](#acknowledgement) - [Citation](#citation) ## Install 1. Clone this repository and navigate to diffusion-vla folder ```bash git clone https://github.com/liyaxuanliyaxuan/TinyVLA git clone https://gitee.com/hanssjtuer/my-tiny-vla ``` 2. Install Package ```Shell conda create -n tinyvla python=3.10 -y conda activate tinyvla pip install --upgrade pip # pip install -r requirements.txt cd policy_heads pip install -e . # install llava-pythia cd ../llava-pythia pip install -e . # download model cd ../models export HF_ENDPOINT=https://hf-mirror.com huggingface-cli download lesjie/Llava-Pythia-400M --local-dir ./Llava-Pythia-400M # create data python3 create_data.py python3 data2h5.py python3 cut_h5.py ``` ## Data Preparation 1. Our data format is the same as [act](https://github.com/MarkFzp/act-plus-plus), so you need to transfer your data into h5py format. You can refer to the [rlds_to_h5py.py](https://github.com/lesjie-wen/tinyvla/blob/main/data_utils/rlds_to_h5py.py) which is used to transfer the data from rlds format to h5py format. ```angular2html # h5 data structure root |-action (100,10) |-language_raw (1,) |-observations |-images # multi-view |-left (100,480,640,3) |-right (100,480,640,3) |-wrist (100,480,640,3) |-joint_positions (100,7) |-qpos (100,7) |-qvel (100,7) ``` 2. You have to add one entry in [constants.py](https://github.com/lesjie-wen/tinyvla/blob/main/aloha_scripts/constants.py) to specify the path of your data as follows. ```python 'your_task_name':{ 'dataset_dir': DATA_DIR + '/your_task_path', # define the path of the dataset 'episode_len': 1000, #max length of the episode, 'camera_names': ['front', 'wrist'] # define the camera names which are used as the key when reading data } ``` ## Download Pretrained VLM We construct the VLM backbone by integrating a series of tiny LLM([Pythia](https://github.com/EleutherAI/pythia)) into [Llava](https://github.com/haotian-liu/LLaVA) framework. We follow the standard training pipe line and data provided by [Llava](https://github.com/haotian-liu/LLaVA). All the weights of VLM used in our paper are listed as following: | Model | Usage | Link | |---------------------|---------------|----------------------------------------------------------------| | Llava-Pythia(~400M) | For TinyVLA-S | [huggingface](https://huggingface.co/lesjie/Llava-Pythia-400M) | | Llava-Pythia(~700M) | For TinyVLA-B | [huggingface](https://huggingface.co/lesjie/Llava-Pythia-700M) | | Llava-Pythia(~1.3B) | For TinyVLA-H | [huggingface](https://huggingface.co/lesjie/Llava-Pythia-1.3B) | ## Train The training script is "scripts/train.sh". And you need to change following parameters: 1. **OUTPUT** :refers to the save directory for training, which must include the keyword "llava_pythia" (and optionally "lora"). If LoRA training is used, the name must include "lora" (e.g., "llava_pythia_lora"). 2. **task_name** :refers to the tasks used for training, which should be corresponded to "your_task_name" in aloha_scripts/constant.py 3. **model_name_or_path** :path to the pretrained VLM weights 4. Other hyperparameters like "batch_size", "save_steps" could be customized according to your computation resources. Start training by following commands: if you first run ./scripts/train.sh, you need to give root permission ```shell chmod a+x ./scripts/train.sh ``` Then run: ```shell ./scripts/train.sh ``` ## Evaluation Before evaluation, we provide a post process script to generate a usable and smaller weights. The process script is "scripts/process_ckpts.sh". And you need to change following parameters: 1. **source_dir** :path to trained VLA dir equals to **OUTPUT** in train.sh 2. **target_dir** :path to save processed VLA weights You can refer to our evaluation script [eval_real_franka.py](https://github.com/lesjie-wen/tinyvla/blob/main/eval_real_franka.py). ## Acknowledgement We build our project based on: - [LLaVA](https://github.com/haotian-liu/LLaVA): an amazing open-sourced project for vision language assistant - [act-plus-plus](https://github.com/haotian-liu/LLaVA): an amazing open-sourced project for robotics visuomotor learning - [Miphi](https://github.com/zhuyiche/llava-phi): an amazing open-sourced project for tiny vision language model ## Citation If you find Tiny-VLA useful for your research and applications, please cite using this BibTeX: ```bibtex @misc{ @inproceedings{wen2024tinyvla, title={Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation}, author={Wen, Junjie and Zhu, Yichen and Li, Jinming and Zhu, Minjie and Wu, Kun and Xu, Zhiyuan and Liu, Ning and Cheng, Ran and Shen, Chaomin and Peng, Yaxin and others}, booktitle={IEEE Robotics and Automation Letters (RA-L)}, year={2025} } ``` # model inference We provide a simple script to run inference with the trained TinyVLA model. You can find the script in `tests/run_model.py`. Below are the steps to run the inference: ```shell python3 tests/run_model.py ```