TinyLLaVA_Factory下載TinyLLaVA_Factory源代碼下載

Tinyllava工廠

？消息

[2024.08.13]添加了一種簡單的可視化工具，用於解釋Tinyllava的預測。
[2024.05.21]我們的論文：Tinyllava工廠：針對小型大型多模式模型的模塊化代碼庫已發布！
[2024.05.15]我們的新代碼庫Tinyllava工廠已發布！請注意，舊的代碼庫Tinyllavabench被移至tinyllava_bench分支。
[2024.05.04] Tinyllava演示已發布！（訪問我們的演示的密碼為“ 1234”。）
[2024.02.21]我們的論文：Tinyllava：釋放了小型大型多模式的框架！

要點

我們的最佳型號Tinyllava-Phi-2-Siglip-3.1b在現有的7B型號（例如Llava-1.5和Qwen-vl）中取得了更好的總體性能。
Tinyllava Factory是用於小型大型多模型（LMM）的開源模塊化代碼庫，該模型（LMMS）以Pytorch和HuggingFace實現，重點是簡單的代碼實現，新功能的可擴展性以及可重複性的訓練結果。
借助Tinyllava工廠，您可以自定義自己的大型多模式，並以更少的編碼工作和更少的編碼錯誤來定制。
Tinyllava工廠集成了一套尖端的模型和方法。
- LLM目前支持Openelm ， Tinyllama ， Stablelm ， Qwen ， Gemma和Phi 。
- 目前，視覺塔支持夾子， siglip ， dino和夾子和恐龍的組合。
- 連接器當前支持MLP ， QFormer和Res採樣器。
- 培訓食譜目前支持冷凍/完全/部分調整和Lora/Qlora調整。

內容

？消息
要點
內容
安裝和要求
- 升級到最新代碼庫
開始
- 1。數據準備
- 2。火車
- 3。評估
模型動物園
- 訓練有素的模型
  - 模型性能
- 舊模型
本地啟動演示
- Gradio Web演示
- CLI推斷
- 快速推斷腳本
自定義芬特
自定義您自己的大型多模型模型
- LLM
- 視覺塔
- 連接器
致謝
接觸
✏引用
❤️社區努力

安裝和要求

請注意，我們的環境要求與LLAVA的環境要求不同。我們強烈建議您從頭開始創建環境。

克隆此存儲庫並導航到文件夾

git clone https://github.com/TinyLLaVA/TinyLLaVA_Factory.git
cd TinyLLaVA_Factory

創建一個Conda環境，激活它並安裝軟件包

conda create -n tinyllava_factory python=3.10 -y
conda activate tinyllava_factory
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

安裝其他軟件包

pip install flash-attn --no-build-isolation

升級到最新代碼庫

git pull
pip install -e .

開始

1。數據準備

請參閱我們的文檔中的數據準備部分。

2。火車

這是使用PHI-2訓練LMM的示例。

用您的scripts/train/train_phi.sh替換數據路徑
在scripts/train/pretrain.sh中替換您的output_dir
在scripts/train/finetune.sh中替換pretrained_model_path和output_dir
調整您的GPU ID（Localhost）和per_device_train_batch_size in scripts/train/pretrain.sh和scripts/train/finetune.sh

bash scripts/train/train_phi.sh

下面提供了用於預訓練和填充的重要超參數。

訓練階段	全局批處理大小	學習率	conv_version
預處理	256	1E-3	預認證
微調	128	2E-5	皮

尖端：

全局批處理大小= gpus * per_device_train_batch_size * gradient_accumulation_steps ，我們建議您始終保持全局批次大小和學習率，除了lora調整模型外。

conv_version是用於選擇不同LLM的不同聊天模板的超參數。在預訓練階段，使用pretrain的所有LLM都conv_version 。在填補階段，我們使用

PHI-2，StableLM，QWEN-1.5的phi

Tinyllama的llama ，Openelm

Gemma的gemma

3。評估

請參閱我們的文檔評估部分。

模型動物園

訓練有素的模型

使用Tinyllava工廠訓練。

tinyllava-phi-2-siglip-3.1b
Tinyllava-Gemma-Siglip-2.4b
tinyllava-openelm-450m-Siglip-0.89b
tinyllava-qwen2-0.5b-siglip
tinyllava-qwen2.5-3b-siglip

模型性能

VT（HF路徑）	LLM（HF路徑）	食譜	VQA-V2	GQA	SQA圖像	textvqa	MM-VET	教皇	媽媽	mmmu-val
Openai/clip-vit-large-patch14-336	Apple/OpenElm-450m-Instruct	根據	69.5	52.1	50.6	40.4	20.0	83.6	1052.9	23.9
Google/siglip-SO400M-Patch14-384	Apple/OpenElm-450m-Instruct	根據	71.7	53.9	54.1	44.0	20.0	85.4	1118.8	24.0
Google/siglip-SO400M-Patch14-384	QWEN/QWEN2-0.5B	根據	72.3	55.8	60.1	45.2	19.5	86.6	1153.0	29.7
Google/siglip-SO400M-Patch14-384	qwen/qwen2.5-0.5b	根據	75.3	59.5	60.3	48.3	23.9	86.1	1253.0	33.3
Google/siglip-SO400M-Patch14-384	QWEN/QWEN2.5-3B	根據	79.4	62.5	74.1	58.3	34.8	87.4	1438.7	39.9
Openai/clip-vit-large-patch14-336	tinyllama/tinyllama-1.1b-chat-v1.0	根據	73.7	58.0	59.9	46.3	23.2	85.5	1284.6	27.9
Google/siglip-SO400M-Patch14-384	tinyllama/tinyllama-1.1b-chat-v1.0	根據	75.5	58.6	64.0	49.6	23.5	86.3	1256.5	28.3
Openai/clip-vit-large-patch14-336	StematieAi/stablelm-2-Zephyr-1_6b	根據	75.9	59.5	64.6	50.5	27.3	86.1	1368.1	31.8
Google/siglip-SO400M-Patch14-384	StematieAi/stablelm-2-Zephyr-1_6b	根據	78.2	60.7	66.7	56.0	29.4	86.3	1319.3	32.6
Google/siglip-SO400M-Patch14-384	Google/gemma-2b-it	根據	78.4	61.6	64.4	53.6	26.9	86.4	1339.0	31.7
Openai/clip-vit-large-patch14-336	Microsoft/phi-2	根據	76.8	59.4	71.2	53.4	31.7	86.8	1448.6	36.3
Google/siglip-SO400M-Patch14-384	Microsoft/phi-2	根據	79.2	61.6	71.9	57.4	35.0	87.2	1462.4	38.2
Google/siglip-SO400M-Patch14-384	Microsoft/phi-2	基地和洛拉	77.6	59.7	71.6	53.8	33.3	87.9	1413.2	35.6
Google/siglip-SO400M-Patch14-384	Microsoft/phi-2	分享	80.1	62.1	73.0	60.3	37.5	87.2	1466.4	38.4

舊模型

使用舊代碼庫Tinyllavabench進行了訓練。

Tinyllava-3.1b
Tinyllava-2.0b
Tinyllava-1.5b
微小的llava-hf

如果您的模型由我們的舊代碼庫Tinyllavabench培訓，並且仍然想使用它們，我們將提供有關如何使用舊模型的Tinyllava-3.1b的示例。

使用舊模型的示例

 from tinyllava . eval . run_tiny_llava import eval_model
from tinyllava . model . convert_legecy_weights_to_tinyllavafactory import *

model = convert_legecy_weights_to_tinyllavafactory ( 'bczhou/TinyLLaVA-3.1B' )

prompt = "What are the things I should be cautious about when I visit here?"
image_file = "https://llava-vl.github.io/static/images/view.jpg"

args = type ( 'Args' , (), {
    "model_path" : None ,
    "model" : model ,
    "query" : prompt ,
    "conv_mode" : "phi" , # the same as conv_version in the training stage. Different LLMs have different conv_mode/conv_version, please replace it
    "image_file" : image_file ,
    "sep" : "," ,
    "temperature" : 0 ,
    "top_p" : None ,
    "num_beams" : 1 ,
    "max_new_tokens" : 512
})()

eval_model ( args )

"""
Output: 
When visiting this serene lakeside location with a wooden dock, there are a few things to be cautious about. First, ensure that the dock is stable and secure before stepping onto it, as it might be slippery or wet, especially if it's a wooden structure. Second, be mindful of the surrounding water, as it can be deep or have hidden obstacles, such as rocks or debris, that could pose a risk. Additionally, be aware of the weather conditions, as sudden changes in weather can make the area more dangerous. Lastly, respect the natural environment and wildlife, and avoid littering or disturbing the ecosystem.
"""

本地啟動演示

Gradio Web演示

通過運行啟動本地Web演示：

python tinyllava/serve/app.py --model-path tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B

CLI推斷

我們還支持使用CLI運行推斷。要使用我們的模型，請運行：

python -m tinyllava.serve.cli 
   --model-path tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B 
   --image-file " ./tinyllava/serve/examples/extreme_ironing.jpg "

快速推斷腳本

如果您想在本地啟動由您自己或我們培訓的模型，則是一個例子。

與自己訓練的模型進行推斷

 from tinyllava . eval . run_tiny_llava import eval_model

model_path = "/absolute/path/to/your/model/"
prompt = "What are the things I should be cautious about when I visit here?"
image_file = "https://llava-vl.github.io/static/images/view.jpg"
conv_mode = "phi" # or llama, gemma, etc

args = type ( 'Args' , (), {
    "model_path" : model_path ,
    "model" : None ,
    "query" : prompt ,
    "conv_mode" : conv_mode ,
    "image_file" : image_file ,
    "sep" : "," ,
    "temperature" : 0 ,
    "top_p" : None ,
    "num_beams" : 1 ,
    "max_new_tokens" : 512
})()

eval_model ( args )

"""
Output: 
XXXXXXXXXXXXXXXXX
"""

使用HuggingFace Transformers與我們訓練的模型進行推斷

 from transformers import AutoTokenizer , AutoModelForCausalLM

hf_path = 'tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B'
model = AutoModelForCausalLM . from_pretrained ( hf_path , trust_remote_code = True )
model . cuda ()
config = model . config
tokenizer = AutoTokenizer . from_pretrained ( hf_path , use_fast = False , model_max_length = config . tokenizer_model_max_length , padding_side = config . tokenizer_padding_side )
prompt = "What are these?"
image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
output_text , genertaion_time = model . chat ( prompt = prompt , image = image_url , tokenizer = tokenizer )

print ( 'model output:' , output_text )
print ( 'runing time:' , genertaion_time )

自定義芬特

如果您想使用自定義數據集使用Finetune Tinyllava，請參考此處。

自定義您自己的大型多模型模型

LLM

如果您想自己添加新的LLM，則需要創建兩個文件：一個用於聊天模板，另一個用於語言模型，在文件夾tinyllava/data/template/和tinyllava/model/llm/下。

這是添加Gemma模型的示例。

首先，創建tinyllava/data/template/gemma_template.py ，將用於填充階段。

 from dataclasses import dataclass
from typing import TYPE_CHECKING , Dict , List , Optional , Sequence , Tuple , Union
from packaging import version

from . formatter import EmptyFormatter , StringFormatter
from . base import Template
from . formatter import Formatter
from . import register_template
from ... utils . constants import *

from transformers import PreTrainedTokenizer
import torch
import tokenizers

    
system = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."

@ register_template ( 'gemma' ) # Enable the TemplateFactory to obtain the added template by this string ('gemma').
@ dataclass
class GemmaTemplate ( Template ):
    format_image_token : "Formatter" = StringFormatter ( slot = "<image> n {{content}}" )
    format_user : "Formatter" = StringFormatter ( slot = "USER" + ": " + "{{content}}" + " " )
    format_assistant : "Formatter" = StringFormatter ( slot = "ASSISTANT" + ": " + "{{content}}" + "<eos>" ) # to be modified according to the tokenizer you choose
    system : "Formatter" = EmptyFormatter ( slot = system + " " )
    separator : "Formatter" = EmptyFormatter ( slot = [ ' ASSISTANT: ' , '<eos>' ]) # to be modified according to the tokenizer you choose

    def _make_masks ( self , labels , tokenizer , sep , eos_token_length , rounds ):
        # your code here
        return labels , cur_len

尖端：

請確保labels （由_make_masks函數返回）遵循此格式：答案和EOS令牌ID不會掩蓋，而其他令牌則被-100掩蓋。

其次，創建tinyllava/model/llm/gemma.py 。

 from transformers import GemmaForCausalLM , AutoTokenizer
# The LLM you want to add along with its corresponding tokenizer.

from . import register_llm

# Add GemmaForCausalLM along with its corresponding tokenizer and handle special tokens.
@ register_llm ( 'gemma' ) # Enable the LLMFactory to obtain the added LLM by this string ('gemma').
def return_gemmaclass (): 
    def tokenizer_and_post_load ( tokenizer ):
        tokenizer . pad_token = tokenizer . unk_token
        return tokenizer
    return ( GemmaForCausalLM , ( AutoTokenizer , tokenizer_and_post_load ))

最後，使用相應的LLM_VERSION和CONV_VERSION創建scripts/train/train_gemma.sh 。

視覺塔

如果您想添加一個新的視覺塔，則需要實現應從基類VisionTower繼承的New Vision Tower類。這是MOF Vision Tower的一個例子。

首先，創建tinyllava/model/vision_tower/mof.py

 @ register_vision_tower ( 'mof' )      
class MoFVisionTower ( VisionTower ):
    def __init__ ( self , cfg ):
        super (). __init__ ( cfg )

        self . _vision_tower = MoF ( cfg )
        self . _image_processor = # your image processor
  
    def _load_model ( self , vision_tower_name , ** kwargs ):
        # your code here, make sure your model can be correctly loaded from pretrained parameters either by huggingface or pytorch loading

    def forward ( self , x , ** kwargs ):
        # your code here

然後，使用相應的VT_VERSION修改培訓腳本。

連接器

如果要添加新連接器，則需要實現應從基類Connector繼承的新連接器類。這是線性連接器的示例。

首先，創建tinyllava/model/connector/linear.py

 import torch . nn as nn

from . import register_connector
from . base import Connector
    
@ register_connector ( 'linear' ) #Enable the ConnectorMFactory to obtain the added connector by this string ('linear').     
class LinearConnector ( Connector ):
    def __init__ ( self , config ):
        super (). __init__ ()
        self . _connector =  nn . Linear ( config . vision_hidden_size , config . hidden_size ) # define your connector model

然後，使用相應的CN_VERSION修改培訓腳本。

致謝

我們特別感謝Lei Zhao，Luche Wang，Kaijun Luo和Junchen Wang建造演示。

接觸

如果您有任何疑問，請隨時發起問題或與我們聯繫（Wechatid： Tinyllava ）。

✏引用

如果您發現我們的論文和代碼對您的研究有用，請考慮給出明星和引用。

 @misc { zhou2024tinyllava ,
      title = { TinyLLaVA: A Framework of Small-scale Large Multimodal Models } , 
      author = { Baichuan Zhou and Ying Hu and Xi Weng and Junlong Jia and Jie Luo and Xien Liu and Ji Wu and Lei Huang } ,
      year = { 2024 } ,
      eprint = { 2402.14289 } ,
      archivePrefix = { arXiv } ,
      primaryClass = { cs.LG }
}

 @article { jia2024tinyllava ,
  title = { TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models } ,
  author = { Jia, Junlong and Hu, Ying and Weng, Xi and Shi, Yiming and Li, Miao and Zhang, Xingjian and Zhou, Baichuan and Liu, Ziyu and Luo, Jie and Huang, Lei and Wu, Ji } ,
  journal = { arXiv preprint arXiv:2405.11788 } ,
  year = { 2024 }
}

❤️社區努力

我們的代碼庫建立在LLAVA項目的基礎上。偉大的工作！
我們的項目使用ShareGPT4V項目中的數據。偉大的工作！

展開