การดาวน์โหลด unsupervised passage reranking unsupervised passage reranking

unsupervised passage reranking

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

สารบัญ

การตั้งค่า
รูปแบบข้อมูลอินพุต
การดาวน์โหลดข้อมูล
การใช้งาน
ผลลัพธ์
ปัญหา
การอ้างอิง

พื้นที่เก็บข้อมูลนี้มีการดำเนินการอย่างเป็นทางการของอัลกอริทึม UPR

อัลกอริทึม UPR

ผลลัพธ์หลังจากการจัดอันดับใหม่ของ Top-1000 Wikipedia Passages

การตั้งค่า

ในการใช้ repo นี้จำเป็นต้องมีการติดตั้ง pytorch มาตรฐาน เราให้การอ้างอิงในไฟล์ข้อกำหนด. txt

เราขอแนะนำให้ใช้หนึ่งในคอนเทนเนอร์ Pytorch ล่าสุดของ NGC สามารถดึงอิมเมจนักเทียบท่าได้ด้วยคำสั่ง docker pull nvcr.io/nvidia/pytorch:22.01-py3 ในการใช้อิมเมจนักเทียบท่านี้จำเป็นต้องมีการติดตั้งชุดเครื่องมือคอนเทนเนอร์ NVIDIA ด้วย

ผ่านคอนเทนเนอร์ Docker โปรดติดตั้ง transformers Library และ sentencepiece โดยใช้ PIP Install

รูปแบบข้อมูลอินพุต

หลักฐานหลักฐานของวิกิพีเดีย

เราติดตามการประชุม DPR และแบ่งส่วนบทความ Wikipedia เป็นข้อความยาว 100 คำ ไฟล์หลักฐานที่ระบุของ DPR สามารถดาวน์โหลดได้ด้วยคำสั่ง

 python utils / download_data . py - - resource data . wikipedia - split . psgs_w100

ไฟล์หลักฐานนี้มีฟิลด์ที่คั่นด้วยแท็บสำหรับรหัสผ่านข้อความข้อความและชื่อเรื่อง

id  text    title
1   " Aaron Aaron ( or ; " " Ahärôn " " ) is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusiv
ely from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained 
with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother's spokesman ( " " prophet "
" ) to the Pharaoh. Part of the Law (Torah) that Moses received from "    Aaron
2   " God at Sinai granted Aaron the priesthood for himself and his male descendants, and he became the first High Priest of the Israelites. Aaron died before the Israelites crossed
 the North Jordan river and he was buried on Mount Hor (Numbers 33:39; Deuteronomy 10:6 says he died and was buried at Moserah). Aaron is also mentioned in the New Testament of the Bib
le. According to the Book of Exodus, Aaron first functioned as Moses' assistant. Because Moses complained that he could not speak well, God appointed Aaron as Moses' " " prophet " " (Exodu
s 4:10-17; 7:1). At the command of Moses, he let "   Aaron
... ... ...

Top-K ที่ดึงข้อมูลได้

รูปแบบข้อมูลอินพุตคือ JSON แต่ละพจนานุกรมในไฟล์ JSON มีคำถามหนึ่งข้อรายการที่มีข้อมูลของข้อความที่ดึงมาบน K-K และรายการคำตอบที่เป็นไปได้ (ไม่บังคับ) สำหรับแต่ละข้อความ Top-K เราจะรวมรหัส (หลักฐาน) HAS_ANSWER และคุณลักษณะคะแนน Retriever แอตทริบิวต์ id คือรหัสผ่านจากไฟล์หลักฐาน Wikipedia, has_answer หมายถึงหากข้อความข้อความมีช่วงคำตอบหรือไม่ ต่อไปนี้เป็นเทมเพลตของไฟล์. json

[
  {
    "question" : " .... " ,
    "answers" : [ " ... " , " ... " , " ... " ],
    "ctxs" : [
              {
                "id" : " .... " ,
                "score" : " ... " ,
                "has_answer" : " .... " ,
              },
              ...
            ]
  },
  ...
]

ตัวอย่างเมื่อดึงข้อความโดยใช้ BM25 เมื่อสอบถามโดยใช้ชุดคำถามตามธรรมชาติ

[
  {
    "question" : " who sings does he love me with reba " ,
    "answers" : [ " Linda Davis " ],
    "ctxs" : [     
              {
               "id" : 11828871 ,
               "score" : 18.3 ,
               "has_answer" : false
              },
              {
                "id" : 11828872 ,
                "score" : 14.7 ,
                "has_answer" : false ,
              },
              {
                "id" : 11828866 ,
                "score" : 14.4 ,
                "has_answer" : true ,
              },
           ...
            ]
  },
  ...
]

การดาวน์โหลดข้อมูล

เราจัดเตรียมข้อความที่ดึงมาจาก 1000 สำหรับการแยก dev/test ของ NaturalQuestions-Open (NQ), Triviaqa, Squad-Open, WebQuestions (WebQ) และชุดข้อมูล EntityQuestions (Eq) ที่ครอบคลุม 5 Retrievers: BM25, MSS, Contriever, DPR และ MSS-DPR โปรดใช้คำสั่งต่อไปนี้เพื่อดาวน์โหลดชุดข้อมูลเหล่านี้

 python utils / download_data . py 
	- - resource { key from download_data . py 's RESOURCES_MAP}  
	[ optional - - output_dir { your location }]

ตัวอย่างเช่นในการดาวน์โหลดข้อมูล Top-K ทั้งหมดให้ใช้ --resource data ในการดาวน์โหลดข้อมูล Top-K ของ Retriever เฉพาะตัวอย่างเช่น BM25 ให้ใช้ --resource data.retriever-outputs.bm25

การใช้งาน

ในการจัดอันดับข้อความที่ดึงมาอีกครั้งด้วย UPR โปรดใช้คำสั่งต่อไปนี้ซึ่งต้องระบุเส้นทางของไฟล์หลักฐานและไฟล์ข้อความที่ดึงมาบนสุด K

DISTRIBUTED_ARGS= " -m torch.distributed.launch --nproc_per_node 8 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 6000 "

python ${DISTRIBUTED_ARGS} upr.py  
  --num-workers 2 
  --log-interval 1 
  --topk-passages 1000 
  --hf-model-name " bigscience/T0_3B " 
  --use-gpu 
  --use-bf16 
  --report-topk-accuracies 1 5 20 100 
  --evidence-data-path " wikipedia-split/psgs_w100.tsv " 
  --retriever-topk-passages-path " bm25/nq-dev.json "

--use-bf16 ให้ความเร็วและการประหยัดหน่วยความจำบน AMPERE GPU เช่น A100 หรือ A6000 อย่างไรก็ตามเมื่อทำงานกับ V100 GPUs อาร์กิวเมนต์นี้ควรถูกลบออก

เราได้ให้ตัวอย่างสคริปต์ "UPR-DEMO.sh" ภายใต้ไดเรกทอรี "ตัวอย่าง" หากต้องการใช้สคริปต์นี้โปรดแก้ไขข้อมูลและพา ธ ไฟล์อินพุต / เอาต์พุตตามลำดับ

ผลลัพธ์

เราให้คะแนนการประเมินผลในชุดทดสอบของชุดข้อมูลเมื่อใช้แบบจำลองภาษา T0-3B ใน UPR

การเรียกคืน

ความแม่นยำในการดึงข้อมูลสูงสุด 20 สำหรับการดึงข้อมูลที่ไม่ได้รับการดูแล

Retriever (+Ran-Ranker)	เปิดทีม	เรื่องไม่สำคัญ	คำถามธรรมชาติเปิด	คำถามเว็บ	คำถามเอนทิตี
MSS	51.3	67.2	60.0	49.2	51.2
MSS + UPR	75.7	81.3	77.3	71.8	71.3
BM25	71.1	76.4	62.9	62.4	71.2
BM25 + UPR	83.6	83.0	78.6	72.9	79.3
ตรงกันข้าม	63.4	73.9	67.9	65.7	63.0
Contriever + UPR	81.3	82.8	80.4	75.7	76.0

ความแม่นยำในการดึงข้อมูลท็อป-20 สำหรับผู้ติดตามภายใต้การดูแล

Retriever (+Ran-Ranker)	เปิดทีม	เรื่องไม่สำคัญ	คำถามธรรมชาติเปิด	คำถามเว็บ	คำถามเอนทิตี
DPR	59.4	79.8	79.2	74.6	51.1
DPR + UPR	80.7	84.3	83.4	76.5	65.4
MSS-DPR	73.1	81.9	81.4	76.9	60.6
MSS-DPR + UPR	85.2	84.8	83.9	77.2	73.9

การศึกษาการระเหย: ผลกระทบของแบบจำลองภาษาที่ผ่านการฝึกอบรมมาก่อน

เราจัดอันดับสหภาพท็อป 1,000 อีกครั้งที่ดึงมาจากแต่ละ BM25 และ MSS Retrievers ชุดคำถามที่เปิดใช้งาน-เปิด ไฟล์ข้อมูลนี้สามารถดาวน์โหลดได้เป็น:

python utils/download_data.py --resource data.retriever-outputs.mss-bm25-union.nq-dev

สำหรับการทดลองด้วยการระเหยเหล่านี้เราผ่านการโต้แย้ง --topk-passages 2000 เนื่องจากไฟล์นี้มีการรวมกันของสองชุดของ 1,000 ข้อความ

รูปแบบภาษา	รีทรีฟเวอร์	ท็อป 1	Top-5	20 อันดับแรก	ท็อป 100
-	BM25	22.3	43.8	62.3	76.0
-	MSS	17.7	38.6	57.4	72.4
T5 (3B)	BM25 + MSS	22.0	50.5	71.4	84.0
GPT-NEO (2.7B)	BM25 + MSS	27.2	55.0	73.9	84.2
GPT-J (6b)	BM25 + MSS	29.8	59.5	76.8	85.6
T5-LM-Adapt (250m)	BM25 + MSS	23.9	51.4	70.7	83.1
T5-LM-Adapt (800m)	BM25 + MSS	29.1	57.5	75.1	84.8
T5-LM-ADAPT (3B)	BM25 + MSS	29.7	59.9	76.9	85.6
T5-LM-Adapt (11b)	BM25 + MSS	32.1	62.3	78.5	85.8
t0-3b	BM25 + MSS	36.7	64.9	79.1	86.1
t0-11b	BM25 + MSS	37.4	64.9	79.1	86.0

รุ่น GPT สามารถเรียกใช้ใน UPR โดยใช้สคริปต์ gpt/upr_gpt.py สคริปต์นี้มีตัวเลือกที่คล้ายกันกับสคริปต์ upr.py แต่เราต้องผ่าน --use-fp16 เป็นอาร์กิวเมนต์แทน --use-bf16 อาร์กิวเมนต์ของ --hf-model-name สามารถเป็นได้ทั้ง EleutherAI/gpt-neo-2.7B หรือ EleutherAI/gpt-j-6B

การทดลอง QA แบบเปิดโดเมน

โปรดดูไดเรกทอรี Open-Domain-QA สำหรับรายละเอียดในการฝึกอบรมและการอนุมานกับจุดตรวจสอบที่ผ่านการฝึกอบรมมาก่อน

ปัญหา

สำหรับข้อผิดพลาดหรือข้อบกพร่องใด ๆ ใน Codebase โปรดเปิดปัญหาใหม่หรือส่งอีเมลไปที่ Devendra Singh Sachan ([email protected])

การอ้างอิง

หากคุณพบว่ารหัสหรือข้อมูลนี้มีประโยชน์โปรดพิจารณาอ้างถึงบทความของเราเป็น:

@article{sachan2022improving,
  title = " Improving Passage Retrieval with Zero-Shot Question Generation " ,
  author = " Sachan, Devendra Singh and Lewis, Mike and Joshi, Mandar and Aghajanyan, Armen and Yih, Wen-tau and Pineau, Joelle and Zettlemoyer, Luke " ,
  booktitle = " Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing " ,
  publisher = " Association for Computational Linguistics " ,
  url = " https://arxiv.org/abs/2204.07496 " ,
  year = " 2022 "
}

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-09-10
ขนาด 415.61KB
มาจาก Github

แอปที่เกี่ยวข้อง

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
เกมเส้นทางมืด

2023-04-13

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
ML stack

โค้ดแหล่งที่มา AI

1.0.0
awesome free chatgpt

โค้ดแหล่งที่มา AI

1.0.0
pywin_contextmenu

โค้ดแหล่งที่มา AI

Version update
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด