ดาวน์โหลด neural cherche - neural cherche Source Source Download

neural cherche

โค้ดแหล่งที่มา AI

1.4.3

ดาวน์โหลด

เชอร์เช่

การค้นหาประสาท

Neural-Cherche เป็นห้องสมุดที่ออกแบบมาเพื่อปรับแต่งแบบจำลองการค้นหาระบบประสาทเช่น Splade, Colbert และ Sparseembed ในชุดข้อมูลเฉพาะ Neural-Cherche ยังให้ชั้นเรียนเพื่อใช้การอนุมานอย่างมีประสิทธิภาพในการปรับแต่งหรือ Ranker Neural-Cherche มีเป้าหมายที่จะเสนอวิธีการที่ตรงไปตรงมาและมีประสิทธิภาพสำหรับการปรับแต่งและใช้รูปแบบการค้นหาประสาททั้งในการตั้งค่าออฟไลน์และออนไลน์ นอกจากนี้ยังช่วยให้ผู้ใช้สามารถบันทึก embeddings ที่คำนวณได้ทั้งหมดเพื่อป้องกันการคำนวณซ้ำซ้อน

Neural-Cherche เข้ากันได้กับอุปกรณ์ CPU, GPU และ MPS เราสามารถปรับแต่ง Colbert ได้จากจุดตรวจสอบที่ผ่านการฝึกอบรมมาก่อน Splade และ Sparseembed นั้นยุ่งยากมากขึ้นในการปรับแต่งและต้องการโมเดล MLM ที่ผ่านการฝึกอบรมมาล่วงหน้า

การติดตั้ง

เราสามารถติดตั้ง Neural-Cherche ได้โดยใช้:

 pip install neural-cherche

หากเราวางแผนที่จะประเมินโมเดลของเราในขณะที่การติดตั้งการฝึกอบรม:

 pip install "neural-cherche[eval]"

เอกสาร

เอกสารที่สมบูรณ์มีอยู่ที่นี่

เริ่มต้นอย่างรวดเร็ว

ชุดข้อมูลการฝึกอบรมของคุณจะต้องทำจากสามเท่า (anchor, positive, negative) โดยที่จุดยึดเป็นแบบสอบถามบวกเป็นเอกสารที่เชื่อมโยงโดยตรงกับจุดยึดและลบเป็นเอกสารที่ไม่เกี่ยวข้องกับสมอ

 X = [
    ( "anchor 1" , "positive 1" , "negative 1" ),
    ( "anchor 2" , "positive 2" , "negative 2" ),
    ( "anchor 3" , "positive 3" , "negative 3" ),
]

และนี่คือวิธีปรับแต่ง Colbert จากจุดตรวจสอบที่ได้รับการฝึกอบรมมาก่อนโดยใช้ Neural-Cherche:

 import torch

from neural_cherche import models , utils , train

model = models . ColBERT (
    model_name_or_path = "raphaelsty/neural-cherche-colbert" ,
    device = "cuda" if torch . cuda . is_available () else "cpu" # or mps
)

optimizer = torch . optim . AdamW ( model . parameters (), lr = 3e-6 )

X = [
    ( "query" , "positive document" , "negative document" ),
    ( "query" , "positive document" , "negative document" ),
    ( "query" , "positive document" , "negative document" ),
]

for step , ( anchor , positive , negative ) in enumerate ( utils . iter (
        X ,
        epochs = 1 , # number of epochs
        batch_size = 8 , # number of triples per batch
        shuffle = True
    )):

    loss = train . train_colbert (
        model = model ,
        optimizer = optimizer ,
        anchor = anchor ,
        positive = positive ,
        negative = negative ,
        step = step ,
        gradient_accumulation_steps = 50 ,
    )

    
    if ( step + 1 ) % 1000 == 0 :
        # Save the model every 1000 steps
        model . save_pretrained ( "checkpoint" )

การเรียกคืน

นี่คือวิธีการใช้โมเดล Colbert ที่ปรับแต่งอย่างละเอียดเพื่อจัดอันดับเอกสารใหม่:

 import torch
from lenlp import sparse

from neural_cherche import models , rank , retrieve

documents = [
    { "id" : "doc1" , "title" : "Paris" , "text" : "Paris is the capital of France." },
    { "id" : "doc2" , "title" : "Montreal" , "text" : "Montreal is the largest city in Quebec." },
    { "id" : "doc3" , "title" : "Bordeaux" , "text" : "Bordeaux in Southwestern France." },
]

retriever = retrieve . BM25 (
    key = "id" ,
    on = [ "title" , "text" ],
    count_vectorizer = sparse . CountVectorizer (
        normalize = True , ngram_range = ( 3 , 5 ), analyzer = "char_wb" , stop_words = []
    ),
    k1 = 1.5 ,
    b = 0.75 ,
    epsilon = 0.0 ,
)

model = models . ColBERT (
    model_name_or_path = "raphaelsty/neural-cherche-colbert" ,
    device = "cuda" if torch . cuda . is_available () else "cpu" ,  # or mps
)

ranker = rank . ColBERT (
    key = "id" ,
    on = [ "title" , "text" ],
    model = model ,
)

documents_embeddings = retriever . encode_documents (
    documents = documents ,
)

retriever . add (
    documents_embeddings = documents_embeddings ,
)

ตอนนี้เราสามารถดึงเอกสารโดยใช้โมเดลที่ปรับแต่งได้แล้ว:

 queries = [ "Paris" , "Montreal" , "Bordeaux" ]

queries_embeddings = retriever . encode_queries (
    queries = queries ,
)

ranker_queries_embeddings = ranker . encode_queries (
    queries = queries ,
)

candidates = retriever (
    queries_embeddings = queries_embeddings ,
    batch_size = 32 ,
    k = 100 ,  # number of documents to retrieve
)

# Compute embeddings of the candidates with the ranker model.
# Note, we could also pre-compute all the embeddings.
ranker_documents_embeddings = ranker . encode_candidates_documents (
    candidates = candidates ,
    documents = documents ,
    batch_size = 32 ,
)

scores = ranker (
    queries_embeddings = ranker_queries_embeddings ,
    documents_embeddings = ranker_documents_embeddings ,
    documents = candidates ,
    batch_size = 32 ,
)

scores

[[{ 'id' : 0 , 'similarity' : 22.825355529785156 },
  { 'id' : 1 , 'similarity' : 11.201947212219238 },
  { 'id' : 2 , 'similarity' : 10.748161315917969 }],
 [{ 'id' : 1 , 'similarity' : 23.21628189086914 },
  { 'id' : 0 , 'similarity' : 9.9658203125 },
  { 'id' : 2 , 'similarity' : 7.308732509613037 }],
 [{ 'id' : 1 , 'similarity' : 6.4031805992126465 },
  { 'id' : 0 , 'similarity' : 5.601611137390137 },
  { 'id' : 2 , 'similarity' : 5.599479675292969 }]]

Neural-Cherche จัดหา SparseEmbed , SPLADE , TFIDF , BM25 Retriever และ ColBERT Ranker ซึ่งสามารถใช้ในการสั่งซื้อเอาต์พุตของ Retriever อีกครั้ง สำหรับข้อมูลเพิ่มเติมโปรดดูเอกสารประกอบ

รุ่นที่ผ่านการฝึกอบรมมาก่อน

เราให้บริการด่านตรวจที่ได้รับการฝึกฝนมาก่อนที่ออกแบบมาโดยเฉพาะสำหรับ Neural-Cherche: Raphaelsty/Neural-Cherche-sparse-embed และ Raphaelsty/Neural-Cherche-Colbert จุดตรวจเหล่านั้นได้รับการปรับแต่งอย่างละเอียดในชุดย่อยของชุดข้อมูล MS-Marco และจะได้รับประโยชน์จากการปรับแต่งในชุดข้อมูลเฉพาะของคุณ คุณสามารถปรับแต่ง Colbert ได้จากจุดตรวจสอบของ Transformer ใด ๆ ที่ได้รับการฝึกอบรมล่วงหน้าเพื่อให้พอดีกับภาษาเฉพาะของคุณ คุณควรใช้จุดตรวจสอบ MLM เพื่อปรับแต่งแบบกระจัดกระจาย

		ชุดข้อมูล scifact
แบบอย่าง	จุดตรวจ HuggingFace	ndcg@10	ฮิต@10	ฮิต@1
TFIDF	-	0,62	0,86	0,50
BM25	-	0,69	0,92	0,56
เบาบาง	Raphaelsty/neural-cherche-sparse-embed	0,62	0,87	0,48
หม้อแปลงประโยค	ประโยค-transformers/all-mpnet-base-v2	0,66	0,89	0,53
Colbert	Raphaelsty/Neural-Cherche-Colbert	0,70	0,92	0,58
TFIDF Retriever + Colbert Ranker	Raphaelsty/Neural-Cherche-Colbert	0,71	0,94	0,59
BM25 Retriever + Colbert Ranker	Raphaelsty/Neural-Cherche-Colbert	0,72	0,95	0,59

ผู้มีส่วนร่วมของ Neural-Cherche

เบนจามินClavié
Arthur Satouf

การอ้างอิง

Splade: โมเดลคำศัพท์และการขยายตัวแบบเบาบางสำหรับการจัดอันดับขั้นตอนแรก โดย Thibault เป็นทางการ, Benjamin Piwowarski, Stéphane Clinchant, Sigir 2021
Splade V2: รูปแบบคำศัพท์และการขยายตัวแบบเบาบางสำหรับการดึงข้อมูล โดย Thibault อย่างเป็นทางการ, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant, Sigir 2022
กระจัดกระจาย: การเรียนรู้การเป็นตัวแทนคำศัพท์ที่กระจัดกระจายด้วยการฝังบริบทสำหรับการดึงข้อมูล โดย Weize Kong, Jeffrey M. Dudek, Cheng Li, Mingyang Zhang และ Mike Bendersky, Sigir 2023
Colbert: การค้นหาทางเดินที่มีประสิทธิภาพและมีประสิทธิภาพผ่านการโต้ตอบล่าช้าตามบริบทเหนือ Bert ที่ประพันธ์โดย Omar Khattab, Matei Zaharia, Sigir 2020

ใบอนุญาต

ไลบรารี Python นี้ได้รับใบอนุญาตภายใต้ใบอนุญาต MIT Open-Source และโมเดล Splade ได้รับอนุญาตเป็นผู้เขียนที่ไม่ใช่เชิงพาณิชย์เท่านั้น Sparseembed และ Colbert เป็นโอเพ่นซอร์สอย่างเต็มที่รวมถึงการใช้งานเชิงพาณิชย์

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.4.3
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-09-08
ขนาด 1.98MB
มาจาก Github

แอปที่เกี่ยวข้อง

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
เวอร์ชัน Neural Cloud สำหรับ Android

2022-11-24

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
ML stack

โค้ดแหล่งที่มา AI

1.0.0
awesome free chatgpt

โค้ดแหล่งที่มา AI

1.0.0
pywin_contextmenu

โค้ดแหล่งที่มา AI

Version update
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด