words2contact Download - words2contact Quellcode Download

words2contact

AI-Quellcode

1.0.0

Herunterladen

Words2Contact: Identifizieren von Unterstützungskontakten aus verbalen Anweisungen mithilfe von Grundmodellen

Github

Offizielle Implementierung des Papiers "Words2Contact: Identifizierung von Unterstützungskontakten aus verbalen Anweisungen unter Verwendung von Grundmodellen", die auf IEEE-Ras Humanoiden 2024 präsentiert werden.

Dieses Repository enthält die Implementierung des LLMS/VLMS -Teils des Projekts. Besuchen Sie dieses Repo.

Weitere Informationen finden Sie auf der Paper -Website.

Inhaltsverzeichnis

Words2Contact: Identifizieren von Unterstützungskontakten aus verbalen Anweisungen mithilfe von Grundmodellen
- Inhaltsverzeichnis
- Repository -Struktur
- Voraussetzungen
- Installation
- Verwendung
  - Aufstellen
  - Starten des Docker -Containers
  - Schneller Start
  - Befehlszeilenoptionen
  - Verwenden lokaler LLMs
- Kontakt
- Zitieren von Words2Contact
- Anerkennung

Repository -Struktur

 .
├── .ci/                       # Docker configurations
│   └── Dockerfile             # Dockerfile to build the project's container
├── config/                    # Configuration files for models
│   └── GroundingDINO_SwinT_OGC.py # GroundingDINO configuration
├── data/                      # Test data and outputs
│   ├── test.png               # Example input image
│   └── test_output.png        # Example output image
├── media/                     # Media assets
│   ├── ack.png                # Acknowledgment image
│   └── concept_figure_wide.png # Conceptual figure for the project
├── submodules/                # External submodules
│   └── CLIP_Surgery/          # CLIP Surgery code and resources
├── words2contact/             # Core project source code
│   ├── grammar/               # Grammars for constraining language models
│   │   ├── classifier.gbnf    # Grammar for classifying outputs
│   │   └── README.md          # Grammar module documentation
│   ├── prompts/               # Prompts for LLMs
│   │   └── prompts.json       # JSON file with pre-defined prompts
│   ├── geom_utils.py          # Utilities for geometric calculations
│   ├── math_pars.py           # Parsing mathematical expressions
│   ├── saygment.py            # Language-grounded segmentation
│   ├── words2contacts.py      # Core script for Words2Contact
│   └── yello.py               # Language-grounded object detection
├── main.py                    # Entry point for the project
├── launch.sh                  # Docker launch script
├── object_detection.py        # Object detection testing
├── object_segmentation.py     # Object segmentation testing
└── README.md                  # Documentation (this file)

Voraussetzungen

Stellen Sie vor dem Start sicher, dass Sie Folgendes haben:

Docker
NVIDIA Container Toolkit (bei Verwendung von GPU (empfohlen))
Ein OpenAI-API-Schlüssel (bei Verwendung von GPT-basierten LLMs). Sie können es von OpenAI erhalten.

Installation

Im Moment wird nur Docker unterstützt, Conda und PIP -Installationen werden bald hinzugefügt.

Klonen Sie das Repository:

git clone https://github.com/hucebot/words2contact.git
cd words2contact

Erstellen Sie das Docker -Bild:

docker build -t words2contact -f .ci/Dockerfile .

Verwendung

Aufstellen

Wenn Sie vorhaben, die GPT-basierte LLMs von OpenAI zu verwenden, setzen Sie Ihre API-Schlüssel als Umgebungsvariable vor dem Start des Docker-Containers:

 export OPENAI_KEY= < your_openai_api_key >

Starten des Docker -Containers

Führen Sie den folgenden Befehl aus, um den Container zu starten:

bash launch.sh

Dadurch werden im Stamm des Projekts ein models/ ein Ordner erstellt, in dem Modelle heruntergeladen und gespeichert werden.

Schneller Start

So testen Words2Contakte mit dem angegebenen Beispielbild:

python main.py --image_path data/test.png --prompt " Place your hand above the red bowl. "

Die Ausgabe wird als data/test_output.png gespeichert.

Weitere Beispiele kommen bald!

Befehlszeilenoptionen

 usage: main.py [-h] [--image_path IMAGE_PATH] [--prompt PROMPT] [--use_gpt] [--yello_vlm YELLO_VLM] [--output_path OUTPUT_PATH] [--llm_path LLM_PATH] [--chat_template CHAT_TEMPLATE]

Run Words2Contact with an image and a text prompt.

options:
  -h, --help                    show this help message and exit
  --image_path IMAGE_PATH       Path to the input image file. Default: 'data/test.png'.
  --prompt PROMPT               Text prompt for Words2Contact. Default: 'Place your hand above the red bowl.'.
  --use_gpt                     Use OpenAI API for the LLM (requires `OPENAI_KEY`).
  --yello_vlm YELLO_VLM         Model to use for YELLO VLM. Default: 'GroundingDINO'.
  --output_path OUTPUT_PATH     Path to save the output image. Default: 'data/test_output.png'.
  --llm_path LLM_PATH           Path to the `.gguf` LLM model weights.
  --chat_template CHAT_TEMPLATE Chat template to use for local LLMs. Default: 'ChatML'.

Verwenden lokaler LLMs

Download .gguf Gewichte für lokale LLMs von einer vertrauenswürdigen Quelle (z. B. TheBlokes umarmende Gesichtsmodelle).
Legen Sie die Gewichte in die models/ den Ordner.

Geben Sie das Argument --llm_path an, wenn Sie das Skript ausführen:

python main.py --image_path data/test.png --llm_path models/local_model.gguf

Kontakt

Für Fragen oder Unterstützung wenden Sie sich bitte an:

Dionis Totsila : [email protected]

Zitieren von Words2Contact

Wenn Sie Word2Contact, unser Datensatz oder einen Teil dieses Code in Ihrer Forschung verwenden, geben Sie bitte unser Papier an:

 @INPROCEEDINGS { 10769902 ,
  author = { Totsila, Dionis and Rouxel, Quentin and Mouret, Jean-Baptiste and Ivaldi, Serena } ,
  booktitle = { 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids) } ,
  title = { Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models } ,
  year = { 2024 } ,
  volume = { } ,
  number = { } ,
  pages = { 9-16 } ,
  keywords = { Accuracy;Large language models;Pipelines;Natural languages;Humanoid robots;Transforms;Benchmark testing;Iterative methods;Surface treatment } ,
  doi = { 10.1109/Humanoids58906.2024.10769902 } }

Anerkennung

Diese Forschung wurde unterstützt von:

Cper Cyberentreprises
Kreativplattform von Inria/Loria
EU Horizon -Projekt Eurobin (GA N.101070596)
Frankreich 2030 Programm durch die PEFR O2R-Projekte AS3 und PI3 (ANR-22-EXOD-007, ANR-22-EXOD-004)