Скачать words2contact - words2contact исходный код скачать

words2contact

AI Исходный код

1.0.0

Скачать

Words2contact: определение контактов поддержки из устных инструкций с использованием основных моделей

GitHub

Официальная реализация статьи «Words2contact: выявление контактов поддержки из устных инструкций с использованием моделей фундамента», представленных на Heee-Ras Humanoids 2024.

Этот репозиторий содержит реализацию части проекта LLMS/VLMS. Для мультиконтактного контроллера всего тела посетите это репо.

Для получения более подробной информации посетите веб -сайт.

Words2contact: определение контактов поддержки из устных инструкций с использованием основных моделей
- Оглавление
- Структура репозитория
- Предварительные условия
- Установка
- Использование
  - Настраивать
  - Запуск контейнера Docker
  - Быстрый старт
  - Параметры командной строки
  - Используя местные LLMS
- Контакт
- Ссылаясь на слова2 -контакт
- Благодарности

Структура репозитория

 .
├── .ci/                       # Docker configurations
│   └── Dockerfile             # Dockerfile to build the project's container
├── config/                    # Configuration files for models
│   └── GroundingDINO_SwinT_OGC.py # GroundingDINO configuration
├── data/                      # Test data and outputs
│   ├── test.png               # Example input image
│   └── test_output.png        # Example output image
├── media/                     # Media assets
│   ├── ack.png                # Acknowledgment image
│   └── concept_figure_wide.png # Conceptual figure for the project
├── submodules/                # External submodules
│   └── CLIP_Surgery/          # CLIP Surgery code and resources
├── words2contact/             # Core project source code
│   ├── grammar/               # Grammars for constraining language models
│   │   ├── classifier.gbnf    # Grammar for classifying outputs
│   │   └── README.md          # Grammar module documentation
│   ├── prompts/               # Prompts for LLMs
│   │   └── prompts.json       # JSON file with pre-defined prompts
│   ├── geom_utils.py          # Utilities for geometric calculations
│   ├── math_pars.py           # Parsing mathematical expressions
│   ├── saygment.py            # Language-grounded segmentation
│   ├── words2contacts.py      # Core script for Words2Contact
│   └── yello.py               # Language-grounded object detection
├── main.py                    # Entry point for the project
├── launch.sh                  # Docker launch script
├── object_detection.py        # Object detection testing
├── object_segmentation.py     # Object segmentation testing
└── README.md                  # Documentation (this file)

Предварительные условия

Перед началом убедитесь, что у вас есть следующее:

Докер
NVIDIA Container Toolkit (при использовании графического процессора (рекомендуется))
Ключ API OpenAI (при использовании LLM на основе GPT). Вы можете получить его от Openai.

Установка

На данный момент поддерживается только Docker, в ближайшее время будут добавлены установки Conda и PIP.

Клонировать репозиторий:

git clone https://github.com/hucebot/words2contact.git
cd words2contact

Создайте изображение Docker:

docker build -t words2contact -f .ci/Dockerfile .

Использование

Настраивать

Если вы планируете использовать LLM на основе GPT OpenAI, установите свой ключ API в качестве переменной среды перед запуском контейнера Docker:

 export OPENAI_KEY= < your_openai_api_key >

Запуск контейнера Docker

Запустите следующую команду, чтобы запустить контейнер:

bash launch.sh

Это создаст models/ папку в корне проекта, где будут загружены и хранятся модели.

Быстрый старт

Чтобы проверить слова2 Связь с приведенным примером изображения:

python main.py --image_path data/test.png --prompt " Place your hand above the red bowl. "

Вывод будет сохранен как data/test_output.png .

Скоро появится больше примеров!

Параметры командной строки

 usage: main.py [-h] [--image_path IMAGE_PATH] [--prompt PROMPT] [--use_gpt] [--yello_vlm YELLO_VLM] [--output_path OUTPUT_PATH] [--llm_path LLM_PATH] [--chat_template CHAT_TEMPLATE]

Run Words2Contact with an image and a text prompt.

options:
  -h, --help                    show this help message and exit
  --image_path IMAGE_PATH       Path to the input image file. Default: 'data/test.png'.
  --prompt PROMPT               Text prompt for Words2Contact. Default: 'Place your hand above the red bowl.'.
  --use_gpt                     Use OpenAI API for the LLM (requires `OPENAI_KEY`).
  --yello_vlm YELLO_VLM         Model to use for YELLO VLM. Default: 'GroundingDINO'.
  --output_path OUTPUT_PATH     Path to save the output image. Default: 'data/test_output.png'.
  --llm_path LLM_PATH           Path to the `.gguf` LLM model weights.
  --chat_template CHAT_TEMPLATE Chat template to use for local LLMs. Default: 'ChatML'.

Используя местные LLMS

Загрузите .gguf Weights для местных LLMS из надежного источника (например, модели обнимающего лица TheBloke).
Поместите вес в models/ папку.
Укажите аргумент --llm_path при запуске сценария:
```
python main.py --image_path data/test.png --llm_path models/local_model.gguf
```

Контакт

По вопросам или поддержке, пожалуйста, свяжитесь:

Dionis Totsila : [email protected]

Ссылаясь на слова2 -контакт

Если вы используете Words2contact, наш набор данных или часть этого кода в вашем исследовании, пожалуйста, укажите нашу статью:

 @INPROCEEDINGS { 10769902 ,
  author = { Totsila, Dionis and Rouxel, Quentin and Mouret, Jean-Baptiste and Ivaldi, Serena } ,
  booktitle = { 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids) } ,
  title = { Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models } ,
  year = { 2024 } ,
  volume = { } ,
  number = { } ,
  pages = { 9-16 } ,
  keywords = { Accuracy;Large language models;Pipelines;Natural languages;Humanoid robots;Transforms;Benchmark testing;Iterative methods;Surface treatment } ,
  doi = { 10.1109/Humanoids58906.2024.10769902 } }

Благодарности

Это исследование было поддержано:

CPER Cyberentreprises
Креативная платформа Inria/Loria
Евробин ЕС Horizon Eurobin (GA N.101070596)
Франция 2030 года через проекты PEPR O2R AS3 и PI3 (ANR-22-EXOD-007, ANR-22-EXOD-004)