words2contact Download - words2contact Source Code Download

words2contact

كود الذكاء الاصطناعي

1.0.0

تنزيل

Words2Contact: تحديد جهات اتصال الدعم من التعليمات اللفظية باستخدام نماذج الأساس

جيثب

التنفيذ الرسمي للورقة "Words2Contact: تحديد اتصالات الدعم من التعليمات اللفظية باستخدام نماذج الأساس" المقدمة في IEEE-RAS Humanoids 2024.

يحتوي هذا المستودع على تنفيذ جزء LLMS/VLMS من المشروع. للحصول على وحدة تحكم متعددة الجسد ، يرجى زيارة هذا الريبو.

لمزيد من التفاصيل ، تفضل بزيارة موقع الورق.

جدول المحتويات

Words2Contact: تحديد جهات اتصال الدعم من التعليمات اللفظية باستخدام نماذج الأساس
- جدول المحتويات
- هيكل المستودع
- المتطلبات الأساسية
- تثبيت
- الاستخدام
  - يثبت
  - إطلاق حاوية Docker
  - بداية سريعة
  - خيارات سطر الأوامر
  - باستخدام LLMs المحلية
- اتصال
- نقلا عن الكلمات 2Contact
- شكر وتقدير

هيكل المستودع

 .
├── .ci/                       # Docker configurations
│   └── Dockerfile             # Dockerfile to build the project's container
├── config/                    # Configuration files for models
│   └── GroundingDINO_SwinT_OGC.py # GroundingDINO configuration
├── data/                      # Test data and outputs
│   ├── test.png               # Example input image
│   └── test_output.png        # Example output image
├── media/                     # Media assets
│   ├── ack.png                # Acknowledgment image
│   └── concept_figure_wide.png # Conceptual figure for the project
├── submodules/                # External submodules
│   └── CLIP_Surgery/          # CLIP Surgery code and resources
├── words2contact/             # Core project source code
│   ├── grammar/               # Grammars for constraining language models
│   │   ├── classifier.gbnf    # Grammar for classifying outputs
│   │   └── README.md          # Grammar module documentation
│   ├── prompts/               # Prompts for LLMs
│   │   └── prompts.json       # JSON file with pre-defined prompts
│   ├── geom_utils.py          # Utilities for geometric calculations
│   ├── math_pars.py           # Parsing mathematical expressions
│   ├── saygment.py            # Language-grounded segmentation
│   ├── words2contacts.py      # Core script for Words2Contact
│   └── yello.py               # Language-grounded object detection
├── main.py                    # Entry point for the project
├── launch.sh                  # Docker launch script
├── object_detection.py        # Object detection testing
├── object_segmentation.py     # Object segmentation testing
└── README.md                  # Documentation (this file)

المتطلبات الأساسية

قبل البدء ، تأكد من أن لديك ما يلي:

عامل ميناء
مجموعة أدوات حاوية NVIDIA (إذا كنت تستخدم GPU (موصى بها))
مفتاح API Openai (إذا كان يستخدم LLMS المستند إلى GPT). يمكنك الحصول عليها من Openai.

تثبيت

في الوقت الحالي ، يتم دعم Docker فقط ، وسيتم إضافة عمليات تثبيت Conda و PIP قريبًا.

استنساخ المستودع:

git clone https://github.com/hucebot/words2contact.git
cd words2contact

بناء صورة Docker:

docker build -t words2contact -f .ci/Dockerfile .

الاستخدام

يثبت

إذا كنت تخطط لاستخدام LLMs المستندة إلى GPT من Openai ، فقم بتعيين مفتاح API الخاص بك كمتغير للبيئة قبل إطلاق حاوية Docker:

 export OPENAI_KEY= < your_openai_api_key >

إطلاق حاوية Docker

قم بتشغيل الأمر التالي لبدء الحاوية:

bash launch.sh

سيؤدي ذلك إلى إنشاء models/ مجلد في جذر المشروع حيث سيتم تنزيل النماذج وتخزينها.

بداية سريعة

لاختبار Words2Contact مع صورة المثال المقدمة:

python main.py --image_path data/test.png --prompt " Place your hand above the red bowl. "

سيتم حفظ الإخراج data/test_output.png .

المزيد من الأمثلة قريبا!

خيارات سطر الأوامر

 usage: main.py [-h] [--image_path IMAGE_PATH] [--prompt PROMPT] [--use_gpt] [--yello_vlm YELLO_VLM] [--output_path OUTPUT_PATH] [--llm_path LLM_PATH] [--chat_template CHAT_TEMPLATE]

Run Words2Contact with an image and a text prompt.

options:
  -h, --help                    show this help message and exit
  --image_path IMAGE_PATH       Path to the input image file. Default: 'data/test.png'.
  --prompt PROMPT               Text prompt for Words2Contact. Default: 'Place your hand above the red bowl.'.
  --use_gpt                     Use OpenAI API for the LLM (requires `OPENAI_KEY`).
  --yello_vlm YELLO_VLM         Model to use for YELLO VLM. Default: 'GroundingDINO'.
  --output_path OUTPUT_PATH     Path to save the output image. Default: 'data/test_output.png'.
  --llm_path LLM_PATH           Path to the `.gguf` LLM model weights.
  --chat_template CHAT_TEMPLATE Chat template to use for local LLMs. Default: 'ChatML'.

باستخدام LLMs المحلية

تنزيل .gguf أوزان لـ LLMs المحلية من مصدر موثوق به (على سبيل المثال ، نماذج وجه THEBLOKE).
ضع الأوزان في models/ المجلد.

حدد وسيطة --llm_path عند تشغيل البرنامج النصي:

python main.py --image_path data/test.png --llm_path models/local_model.gguf

اتصال

للأسئلة أو الدعم ، يرجى الاتصال:

Dionis Totsila : [email protected]

نقلا عن الكلمات 2Contact

إذا كنت تستخدم Words2Contact ، أو مجموعة البيانات الخاصة بنا أو جزء من هذا الرمز في بحثك ، فيرجى الاستشهاد بورقة:

 @INPROCEEDINGS { 10769902 ,
  author = { Totsila, Dionis and Rouxel, Quentin and Mouret, Jean-Baptiste and Ivaldi, Serena } ,
  booktitle = { 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids) } ,
  title = { Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models } ,
  year = { 2024 } ,
  volume = { } ,
  number = { } ,
  pages = { 9-16 } ,
  keywords = { Accuracy;Large language models;Pipelines;Natural languages;Humanoid robots;Transforms;Benchmark testing;Iterative methods;Surface treatment } ,
  doi = { 10.1109/Humanoids58906.2024.10769902 } }

شكر وتقدير

تم دعم هذا البحث من قبل:

CPER CYBERENTREPRISES
Creativ'lab منصة Inria/Loria
EU Horizon Project Eurobin (GA N.101070596)
برنامج فرنسا 2030 من خلال مشاريع PEPR O2R AS3 و PI3 (ANR-22-EXOD-007 ، ANR-22-EXOD-004)