Multimodal GPT Download - Multimodal GPT Source code download

Multimodal GPT

Other source code

1.0.0

Download

? Multi-modal GPT

Train a multi-modal chatbot with visual and language instructions!

Based on the open-source multi-modal model OpenFlamingo, we create various visual instruction data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only language-only instruction data.

The joint training of visual and language instructions effectively improves the performance of the model! For more details please refer to our technical report.

Welcome to join us!

English | 简体中文

Features

Support various vision and language instruction data
Parameter efficient fine-tuning with LoRA
Tuning vision and language at the same time, complement each other

Installation

To install the package in an existing environment, run

git clone https://github.com/open-mmlab/Multimodal-GPT.git
cd Multimodal-GPT
pip install -r requirements.txt
pip install -v -e .

or create a new conda environment

conda env create -f environment.yml

Launch Demo Locally

Download the pre-trained weights.

Use this script for converting LLaMA weights to Hugging Face format.

Download the OpenFlamingo pre-trained model from openflamingo/OpenFlamingo-9B.

Download our LoRA Weight from here.

Then place these models in checkpoints folders like this:
```
checkpoints
├── llama-7b_hf
│   ├── config.json
│   ├── pytorch_model-00001-of-00002.bin
│   ├── ......
│   └── tokenizer.model
├── OpenFlamingo-9B
│   └──checkpoint.pt
├──mmgpt-lora-v0-release.pt
```
launch the gradio demo
```
python app.py
```

Examples

Recipe:

Travel plan:

Movie:

Famous person:

Fine-tuning

Prepare datasets

A-OKVQA

Download annotation from this link and unzip to data/aokvqa/annotations.

It also requires images from coco dataset which can be downloaded from here.
COCO Caption

Download from this link and unzip to data/coco.

It also requires images from coco dataset which can be downloaded from here.
OCR VQA

Download from this link and place in data/OCR_VQA/.
LlaVA

Download from liuhaotian/LLaVA-Instruct-150K and place in data/llava/.

It also requires images from coco dataset which can be downloaded from here.
Mini-GPT4

Download from Vision-CAIR/cc_sbu_align and place in data/cc_sbu_align/.
Dolly 15k

Download from databricks/databricks-dolly-15k and place it in data/dolly/databricks-dolly-15k.jsonl.
Alpaca GPT4

Download it from this link and place it in data/alpaca_gpt4/alpaca_gpt4_data.json.

You can also customize the data path in the configs/dataset_config.py.

Baize

Download it from this link and place it in data/baize/quora_chat_data.json.

Start training

torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py 
  --lm_path checkpoints/llama-7b_hf 
  --tokenizer_path checkpoints/llama-7b_hf 
  --pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt 
  --run_name train-my-gpt4 
  --learning_rate 1e-5 
  --lr_scheduler cosine 
  --batch_size 1  
  --tuning_config configs/lora_config.py 
  --dataset_config configs/dataset_config.py 
  --report_to_wandb

Acknowledgements

OpenFlamingo
LAVIS
Stanford Alpaca
MiniGPT-4
LLaVA
Instruction Tuning with GPT-4

If you find our project useful for your research and applications, please cite using this BibTeX:

@misc{gong2023multimodalgpt,
      title={MultiModal-GPT: A Vision and Language Model for Dialogue with Humans}, 
      author={Tao Gong and Chengqi Lyu and Shilong Zhang and Yudong Wang and Miao Zheng and Qian Zhao and Kuikun Liu and Wenwei Zhang and Ping Luo and Kai Chen},
      year={2023},
      eprint={2305.04790},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-03-04
size 107.26KB
From Github

Related Applications

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
The latest version of GPT film and television

2023-10-30

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All