ดาวน์โหลด words2contact - คำ words2contact ซอร์ส

words2contact

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

Words2Contact: การระบุผู้ติดต่อสนับสนุนจากคำแนะนำทางวาจาโดยใช้แบบจำลองพื้นฐาน

คนอื่น ๆ

การใช้งานอย่างเป็นทางการของกระดาษ "Words2Contact: การระบุผู้ติดต่อสนับสนุนจากคำแนะนำทางวาจาโดยใช้แบบจำลองพื้นฐาน" นำเสนอที่ IEEE-RAS Humanoids 2024

ที่เก็บนี้มีการใช้งานส่วน LLMS/VLMS ของโครงการ สำหรับคอนโทรลเลอร์ทั้งร่างกายที่ติดต่อได้หลายแบบโปรดเยี่ยมชม repo นี้

สำหรับรายละเอียดเพิ่มเติมโปรดเยี่ยมชมเว็บไซต์กระดาษ

สารบัญ

Words2Contact: การระบุผู้ติดต่อสนับสนุนจากคำแนะนำทางวาจาโดยใช้แบบจำลองพื้นฐาน
- สารบัญ
- โครงสร้างที่เก็บ
- ข้อกำหนดเบื้องต้น
- การติดตั้ง
- การใช้งาน
  - ตั้งค่า
  - เปิดตัวคอนเทนเนอร์ Docker
  - เริ่มต้นอย่างรวดเร็ว
  - ตัวเลือกบรรทัดคำสั่ง
  - ใช้ LLMS ในท้องถิ่น
- ติดต่อ
- อ้างคำว่า 2 ติดต่อ
- กิตติกรรมประกาศ

โครงสร้างที่เก็บ

 .
├── .ci/                       # Docker configurations
│   └── Dockerfile             # Dockerfile to build the project's container
├── config/                    # Configuration files for models
│   └── GroundingDINO_SwinT_OGC.py # GroundingDINO configuration
├── data/                      # Test data and outputs
│   ├── test.png               # Example input image
│   └── test_output.png        # Example output image
├── media/                     # Media assets
│   ├── ack.png                # Acknowledgment image
│   └── concept_figure_wide.png # Conceptual figure for the project
├── submodules/                # External submodules
│   └── CLIP_Surgery/          # CLIP Surgery code and resources
├── words2contact/             # Core project source code
│   ├── grammar/               # Grammars for constraining language models
│   │   ├── classifier.gbnf    # Grammar for classifying outputs
│   │   └── README.md          # Grammar module documentation
│   ├── prompts/               # Prompts for LLMs
│   │   └── prompts.json       # JSON file with pre-defined prompts
│   ├── geom_utils.py          # Utilities for geometric calculations
│   ├── math_pars.py           # Parsing mathematical expressions
│   ├── saygment.py            # Language-grounded segmentation
│   ├── words2contacts.py      # Core script for Words2Contact
│   └── yello.py               # Language-grounded object detection
├── main.py                    # Entry point for the project
├── launch.sh                  # Docker launch script
├── object_detection.py        # Object detection testing
├── object_segmentation.py     # Object segmentation testing
└── README.md                  # Documentation (this file)

ข้อกำหนดเบื้องต้น

ก่อนเริ่มต้นตรวจสอบให้แน่ใจว่าคุณมีสิ่งต่อไปนี้:

นักเทียบท่า
ชุดเครื่องมือคอนเทนเนอร์ NVIDIA (หากใช้ GPU (แนะนำ)))
คีย์ OpenAI API (หากใช้ LLM ที่ใช้ GPT) คุณสามารถรับได้จาก openai

การติดตั้ง

สำหรับตอนนี้รองรับ Docker เท่านั้นการติดตั้ง Conda และ Pip จะถูกเพิ่มในไม่ช้า

โคลนที่เก็บ:

git clone https://github.com/hucebot/words2contact.git
cd words2contact

สร้างภาพนักเทียบท่า:
```
docker build -t words2contact -f .ci/Dockerfile . 
```

การใช้งาน

ตั้งค่า

หากคุณวางแผนที่จะใช้ LLM ที่ใช้ GPT ของ OpenAI ตั้งค่าคีย์ API ของคุณเป็นตัวแปรสภาพแวดล้อมก่อนที่จะเปิดตัวคอนเทนเนอร์ Docker:

 export OPENAI_KEY= < your_openai_api_key >

เปิดตัวคอนเทนเนอร์ Docker

เรียกใช้คำสั่งต่อไปนี้เพื่อเริ่มคอนเทนเนอร์:

bash launch.sh

สิ่งนี้จะสร้าง models/ โฟลเดอร์ในรูทของโครงการที่จะดาวน์โหลดและจัดเก็บโมเดล

เริ่มต้นอย่างรวดเร็ว

เพื่อทดสอบคำศัพท์ 2 ติดต่อกับภาพตัวอย่างที่ให้ไว้:

python main.py --image_path data/test.png --prompt " Place your hand above the red bowl. "

เอาต์พุตจะถูกบันทึกเป็น data/test_output.png

ตัวอย่างเพิ่มเติมเร็ว ๆ นี้!

ตัวเลือกบรรทัดคำสั่ง

 usage: main.py [-h] [--image_path IMAGE_PATH] [--prompt PROMPT] [--use_gpt] [--yello_vlm YELLO_VLM] [--output_path OUTPUT_PATH] [--llm_path LLM_PATH] [--chat_template CHAT_TEMPLATE]

Run Words2Contact with an image and a text prompt.

options:
  -h, --help                    show this help message and exit
  --image_path IMAGE_PATH       Path to the input image file. Default: 'data/test.png'.
  --prompt PROMPT               Text prompt for Words2Contact. Default: 'Place your hand above the red bowl.'.
  --use_gpt                     Use OpenAI API for the LLM (requires `OPENAI_KEY`).
  --yello_vlm YELLO_VLM         Model to use for YELLO VLM. Default: 'GroundingDINO'.
  --output_path OUTPUT_PATH     Path to save the output image. Default: 'data/test_output.png'.
  --llm_path LLM_PATH           Path to the `.gguf` LLM model weights.
  --chat_template CHAT_TEMPLATE Chat template to use for local LLMs. Default: 'ChatML'.

ใช้ LLMS ในท้องถิ่น

ดาวน์โหลด .gguf น้ำหนักสำหรับ LLMs ท้องถิ่นจากแหล่งที่เชื่อถือได้ (เช่นโมเดลใบหน้ากอดของ Thebloke)
วางน้ำหนักใน models/ โฟลเดอร์
ระบุอาร์กิวเมนต์ --llm_path เมื่อเรียกใช้สคริปต์:
```
python main.py --image_path data/test.png --llm_path models/local_model.gguf
```

ติดต่อ

สำหรับคำถามหรือการสนับสนุนกรุณาติดต่อ:

Dionis Totsila : [email protected]

อ้างคำว่า 2 ติดต่อ

หากคุณใช้ Words2Contact ชุดข้อมูลหรือส่วนหนึ่งของรหัสนี้ในการวิจัยของคุณโปรดอ้างอิงบทความของเรา:

 @INPROCEEDINGS { 10769902 ,
  author = { Totsila, Dionis and Rouxel, Quentin and Mouret, Jean-Baptiste and Ivaldi, Serena } ,
  booktitle = { 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids) } ,
  title = { Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models } ,
  year = { 2024 } ,
  volume = { } ,
  number = { } ,
  pages = { 9-16 } ,
  keywords = { Accuracy;Large language models;Pipelines;Natural languages;Humanoid robots;Transforms;Benchmark testing;Iterative methods;Surface treatment } ,
  doi = { 10.1109/Humanoids58906.2024.10769902 } }

กิตติกรรมประกาศ

งานวิจัยนี้ได้รับการสนับสนุนโดย:

Cper CyberEntreprises
แพลตฟอร์ม creativ'lab ของ inria/loria
โครงการ Eu Horizon Eurobin (GA N.101070596)
โปรแกรมฝรั่งเศส 2030 ผ่านโครงการ PEPR O2R AS3 และ PI3 (ANR-22-EXOD-007, ANR-22-EXOD-004)