ดาวน์โหลด RelevancyFeedback - ดาวน์โหลดซอร์สโค้ด RelevancyFeedback

RelevancyFeedback

ซอร์สโค้ดอื่น ๆ

1.0.0

ดาวน์โหลด

ข้อเสนอแนะความเกี่ยวข้องของลูกเต๋า

ปลั๊กอิน Solr ของ Dice.com สำหรับการค้นหาส่วนบุคคลและคำแนะนำ (ผ่านปลั๊กอินข้อเสนอแนะที่เกี่ยวข้อง) และการค้นหาแนวคิด / ความหมาย (ผ่านปลั๊กอินข้อเสนอแนะที่ไม่ได้รับการสนับสนุน)

ลิงค์

สไลด์จากการพูดคุย
วิดีโอของการพูดคุย

การสร้างปลั๊กอิน

ไฟล์ jar ที่สร้างไว้ล่วงหน้าสามารถพบได้ในโฟลเดอร์ ./target target โครงการมีไฟล์ maven pom.xml ซึ่งสามารถใช้ในการสร้างจากแหล่งที่มา

SOLR ที่รองรับ

SOLR 5.4 (ดูสาขา)
SOLR 6.3 (ดูสาขา) Master ด้วย
SOLR 7.0 (ดูสาขา) - ยังทำงานใน 7.1

หากมี SOLR รุ่นใดรุ่นหนึ่งที่คุณต้องการโปรดสร้างปัญหา GitHub และฉันจะดูว่าฉันสามารถทำอะไรได้บ้าง ในการรวบรวมด้วยตนเองสำหรับเวอร์ชันเฉพาะให้ใช้ Maven เพื่อรวบรวมปลั๊กอินโดยใช้ไฟล์ pom.xml และอัปเดตเวอร์ชันของไลบรารี Solr และ Lucene ในไฟล์นั้นและใช้ Maven เพื่อดึงการพึ่งพาเหล่านั้น จากนั้นแก้ไขข้อผิดพลาดในการรวบรวมใด ๆ

นำเข้าสู่ Solr

โปรดดูแนวทาง Solr อย่างเป็นทางการสำหรับการลงทะเบียนปลั๊กอินกับ Solr สิ่งนี้เกี่ยวข้องกับการวางไฟล์ JAR ลงในหนึ่งในโฟลเดอร์ที่ SOLR ตรวจสอบไฟล์คลาสและ JAR บน Core Reload

ปลั๊กอินโซล
การเพิ่มปลั๊กอินที่กำหนดเองใน Solr Cloud

ปลั๊กอินข้อเสนอแนะที่เกี่ยวข้อง

ตัวอย่างคำขอการกำหนดค่าตัวจัดการ สำหรับ solrconfig.xml แสดงอยู่ด้านล่างพร้อมความคิดเห็นที่สรุปพารามิเตอร์หลัก:

 <requestHandler name="/rf" class="org.dice.solrenhancements.relevancyfeedback.RelevancyFeedbackHandler">
        <lst name="defaults">
            <str name="omitHeader">true</str>
            <str name="wt">json</str>
            <str name="indent">true</str>			
	
            <!-- Regular query configuration - query parser used when parsing the rf query-->
            <str  name="defType">lucene</str>
            
            <!-- fields returned -->
            <str  name="fl">jobTitle,skill,company</str>
            <!-- fields to match on-->
            <str  name="rf.fl">skillFromSkill,extractTitles</str>
            
            <!-- field weights. Note that term weights are normalized so that each field is weighted exactly in this ratio
            as different fields can get different numbers of matching terms-->
            <str  name="rf.qf">skillFromSkill^3 extractTitles^4.5</str>
            
            <int  name="rows">10</int>
            
            <!-- How many terms to extract per field (max) -->
            <int  name="rf.maxflqt">10</int>
            <bool name="rf.boost">true</bool>
            
            <!-- normalize the weights for terms in each field (custom to dice rf, not present in solr MLT) -->
            <bool name="rf.normflboosts">true</bool>
            
            <!-- Take the raw term frequencies (false) or log of the term frequenies (true) -->
            <bool name="rf.logtf">true</bool>
            
            <!-- Minimum should match settings for the rf query - determines what proportion of the terms have to match -->
            <!-- See Solr edismax mm parameter for specifics --> 
            <bool name="rf.mm">25%</bool>            
            
            <!-- Returns the top k terms (see regular solr MLT handler) -->
            <str  name="rf.interestingTerms">details</str>
            
            <!-- Turns the rf query into a boost query using a multiplicative boost, allowing for boosting -->
            <str  name="rf.boostfn"></str>
            
            <!-- q parameter -  If you want to execute one query, and use the rf query to boost the results (e.g. for personalizing search), 
            	 pass the user's query into this parameter. Can take regular query syntax e.g.rf.q={!edismax df=title qf=.... v=$qq}&qq=Java
            	 The regular q parameter is reserved for the rf query (see abpve)
            -->
            <str name="q"></str>
            <!-- Query parser to use for the q query, if passed -->
            <str name="defType"></str>

            <!-- rf.q parameter - Note that the regular q parameter is used only for personalized search scenarios, where you have a main query
            	 and you want to use the rf query generated to boost the main queries documents. Typically the rf.q query is a query that identifies
            	 one or more documents, e.g. rf.q=(id:686867 id:98928980 id:999923). But it cam be any query. Note that it will process ALL documents
            	 matched by the q query (as it's intended mainly for looking up docs by id), so use cautiously.
            -->
            <str name="rf.q"></str>
            <!-- query parser to use for the rf.q query -->
            <str name="rf.defType"></str>
            
            <!-- Settings for personalized search - use the regular parameter names for the query parser defined by defType parameter -->
            <str name="df">title</str>
            <str name="qf"> company_text^0.01 title^12 skill^4 description^0.3</str>
            <str name="pf2">company_text^0.01 title^12 skill^4 description^0.6</str> 
            
            <!-- Content based recommendations settings (post a document to the endpoint in a POST request). The stream.body and stream.head are form parameters
                 You can send them in a GET request, but a POST handles larger data. If you have really large documents, you will need to change the buffer settings
                 so that the request doesn't blow the buffer limits in Solr or your web server.
             -->
            
            <!-- Fields used for processing documents posted to the stream.body and stream.head parameters in a POST call -->
            <str  name="stream.head.fl">title,title_syn</str>
            <str  name="stream.body.fl">extractSkills,extractTitles</str>
            
            <!-- pass a url in this parameter for Solr to download the webpage, and process the Html using the fields configured in the stream.qf parameters -->
            <str  name="stream.url"></str>
            
            <!-- Note that we have two different content stream fields to pass over in the POST request. This allows different analyzers to be appkied to each. 
            For instance, we pass the job title into the stream.head field and parse out job titles, while we pass the job description to the stream.head parameter 
            to parse out skills -->
            <!-- Pass the document body in this parameter as a form parameter. Analysed using the stream.body.fl fields-->
            <str  name="stream.body"></str>
            <!-- Pass the second document field in this parameter. Analysed using the stream.head.fl fields-->
            <str  name="stream.head"></str>
            
            <!-- Specifies a separate set of field weights to apply when procesing a document posted to the request handler via the 
                 stream.body and stream.head parameters -->
            <str  name="stream.qf">extractSkills^4.5 extractTitles^2.25 title^3.0 title_syn^3.0</str>           
        </lst>
</requestHandler>

คำขอตัวอย่าง

http: // localhost: 8983/solr/jobs/rf? q = id: 11f407d319d6cc707437fad874a097c0+id: a2fd2f2e 34667D61FADCDCABFD359CF4 & ROWS = 10 & DF = ชื่อ & FL = ชื่อเรื่องทักษะ GEOCODE, CITY, State & WT = JSON

การตอบสนองตัวอย่าง

 {
  "match":{
      "numFound":2,
      "start":0,
      "docs":[
          {
            "id":"a2fd2f2e34667d61fadcdcabfd359cf4",        
            "title":"Console AAA Sports Video Game Programmer.",
            "skills":["Sports Game Experience a plus.",
              "2-10 years plus Console AAA Video Game Programming Experience"],
            "geocode":"38.124447,-122.55051",
            "city":"Novato",
            "state":"CA"
          },
          {
            "id":"11f407d319d6cc707437fad874a097c0",
            "title":"Game Engineer - Creative and Flexible Work Environment!",
            "skills":["3D Math",
              "Unity3d",
              "C#",
              "3D Math - game programming",
              "game programming",
              "C++",
              "Java"],
            "geocode":"33.97331,-118.243614",
            "city":"Los Angeles",
            "state":"CA"
          }
      ]
  },
  "response":{
      "numFound":5333,
      "start":0,
      "docs":[
          {
            "title":"Software Design Engineer 3 (Game Developer)",
            "skills":["C#",
              "C++",
              "Unity"],
            "geocode":"47.683647,-122.12183",
            "city":"Redmond",
            "state":"WA"
          },
          {          
            "title":"Game Server Engineer - MMO Mobile Gaming Start-Up!",
            "skills":["AWS",
              "Node.JS",
              "pubnub",
              "Websockets",
              "pubnub - Node.JS",
              "Vagrant",
              "Linux",
              "Git",
              "MongoDB",
              "Jenkins",
              "Docker"],
            "geocode":"37.777115,-122.41733",
            "city":"San Francisco",
            "state":"CA"
          },...
      ]
   }
}

ปลั๊กอินข้อเสนอแนะที่ไม่ได้รับการดูแล (ข้อเสนอแนะตาบอด)

ตัวอย่างคำขอการกำหนดค่าตัวจัดการสำหรับ solrconfig.xml แสดงอยู่ด้านล่างพร้อมความคิดเห็นที่สรุปพารามิเตอร์หลัก:

 <requestHandler name="/ufselect" class="org.dice.solrenhancements.unsupervisedfeedback.UnsupervisedFeedbackHandler">
        <lst name="defaults">
            <str name="omitHeader">true</str>
            <str name="wt">json</str>
            <str name="indent">true</str>

            <!-- Regular query configuration -->
            <str  name="defType">edismax</str>
            <str  name="df">title</str>
            <str  name="qf">title^1.5   skills^1.25 description^1.1</str>
            <str  name="pf2">title^3.0  skills^2.5  description^1.5</str>
            <str  name="mm">1</str>
            <str  name="q.op">OR</str>

            <str  name="fl">jobTitle,skills,company</str>
            <int  name="rows">30</int>
                        
            <!-- Unsupervised Feedback (Blind Feedback) query configuration-->
            <str  name="uf.fl">skillsFromskills,titleFromJobTitle</str>
            <!-- How many docs to extract the top terms from -->
            <str  name="uf.maxdocs">50</str>
            <!-- How many terms to extract per field (max) -->
            <int  name="uf.maxflqt">10</int>
            <bool name="uf.boost">true</bool>
            <!-- Relative per-field boosts on the extracted terms (similar to edismax qf parameter -->
            <!-- NOTE: with  uf.normflboosts=true, all terms are normalized so that the total importance of each
            	field on the query is the same, then these relative boosts are applied per field-->
            
            <str  name="uf.qf">skillsFromskills^4.5 titleFromJobTitle^6.0</str>
            
            <!-- Returns the top k terms (see regular solr MLT handler) -->
            <str  name="uf.interestingTerms">details</str>
			
            <!-- unit-length norm all term boosts within a field (recommended) - see talk for details -->
            <bool name="uf.normflboosts">true</bool>
            <!-- use raw term clounts or log term counts? -->
            <bool name="uf.logtf">false</bool>
        </lst>
</requestHandler>

คำขอตัวอย่าง

http: // localhost: 8983/solr/dicejobscp/ufselect? q = เครื่อง+การเรียนรู้+วิศวกร & start = 0 & rows = 10 & uf.logtf = เท็จ & FL = ชื่อเรื่อง, ทักษะ, geocode, เมือง, รัฐ & fq = {! geofilt+sfield = jobendecageOcode+d = 48+pt = 39.6955, -105.0841} & wt = json

การตอบสนองตัวอย่าง

 {
  "match":
  {
    "numFound":7729,
    "start":0,
    "docs":[
      {
        "title":"NLP/Machine Learning Engineer",
        "skills":["Linux",
          "NLP (Natural Language Processing)",
          "SQL",
          "Bash",
          "Python",
          "ML (Machine Learning)",
          "JavaScript",
          "Java"],
        "geocode":"42.35819,-71.050674",
        "city":"Boston",
        "state":"MA"
      },
      {
        "title":"Machine Learning Engineer",
        "skills":["machine learning",
          "java",
          "scala"],
        "geocode":"47.60473,-122.32594",
        "city":"Seattle",
        "state":"WA"
      },
      {
        "title":"Machine Learning Engineer - REMOTE!",
        "skills":["Neo4j",
          "Hadoop",
          "gensim",
          "gensim - C++",
          "Java",
          "R",
          "MongoDB",
          "elastic search",
          "sci-kit learn",
          "Python",
          "C++"],
        "geocode":"37.777115,-122.41733",
        "city":"San Francisco",
        "state":"CA"
        },...
    ]
}

นี่ไม่ใช่แค่ตัวจัดการ MLT ใช่ไหม

ในขณะที่มันขึ้นอยู่กับรหัสตัวจัดการ Solr MLT และอัลกอริทึมอย่างหลวม ๆ (ซึ่งเป็นเพียงอัลกอริทึม Rocchio) มีความแตกต่างที่สำคัญบางประการในการออกแบบอัลกอริทึม ตัวจัดการ MLT ใช้คำศัพท์ K ด้านบนในฟิลด์ที่กำหนดค่าทั้งหมดเมื่อสร้างแบบสอบถาม MLT หากคุณมีฟิลด์ที่มีคำศัพท์ที่กว้างกว่าฟิลด์อื่น ๆ ความถี่เอกสารเฉลี่ยของคำจะต่ำกว่าในสาขาอื่น ๆ ที่มีคำศัพท์ที่เล็กกว่า ซึ่งหมายความว่าข้อกำหนดเหล่านี้จะมีคะแนน IDF ที่สัมพันธ์กันสูงและมีแนวโน้มที่จะครอบงำคำศัพท์ยอดนิยมที่เลือกโดยตัวจัดการ Solr MLT ตัวจัดการคำขอของเราใช้ข้อกำหนด K สูงสุดต่อฟิลด์ นอกจากนี้ยังตรวจสอบให้แน่ใจว่าไม่ว่าจะมีการจับคู่คำศัพท์จำนวนเท่าใดต่อฟิลด์ (ถึงขีด จำกัด ที่กำหนดค่า) ฟิลด์นั้นมีการถ่วงน้ำหนักเท่ากันในแบบสอบถามที่เกิดขึ้นเป็นฟิลด์อื่น ๆ ทั้งหมดก่อนที่จะมีการใช้น้ำหนักเฉพาะฟิลด์ที่ระบุในพารามิเตอร์ RF.QF นี่เป็นปัญหาที่สองของตัวจัดการ Solr MLT ที่เรากล่าวถึง นอกจากนี้เรายังมีฟังก์ชั่นพิเศษมากมาย เราอนุญาตให้ส่งผ่านเนื้อหาของเนื้อหาการจับคู่กับเอกสารหลายฉบับ (เช่น 'เหล่านี้' ซึ่งตรงข้ามกับ 'this') มากขึ้นโดยใช้ตัวแยกวิเคราะห์แบบสอบถาม Boost กับการสืบค้น MLT ที่เกิดขึ้น และเราสนับสนุนพารามิเตอร์ MM ดังนั้นเราจึงสามารถบังคับให้เอกสารกลับมาซึ่งตรงกับค่า % ของคำศัพท์ด้านบนเท่านั้น

ข้อควรพิจารณาที่สำคัญเมื่อใช้สำหรับการค้นหาส่วนบุคคล

หากคุณต้องการใช้สิ่งนี้เพื่อทำการค้นหาในแบบส่วนบุคคลดังที่แสดงในการพูดคุย Lucene Revolution 2017 ของฉันคุณต้องส่งผ่านในคำค้นหาปัจจุบันของผู้ใช้โดยใช้พารามิเตอร์ Q ปกติและข้อมูลที่ใช้ในการสร้างแบบสอบถาม Rocchio จะผ่านพารามิเตอร์ RF.Q. อย่างไรก็ตามโปรดทราบว่าการเพิ่มที่ใช้กับข้อกำหนดในแบบสอบถาม Rocchio นั้นไม่ได้มีน้ำหนักเปรียบเทียบกับผู้ใช้ในแบบสอบถามผู้ใช้ของคุณเนื่องจากกระบวนการของการทำให้เป็นมาตรฐานที่อัลกอริทึมใช้ ดังนั้นคุณจะต้องทดลองกับค่า RF.QF ที่แตกต่างกันจนกว่าคุณจะพบว่ามีอิทธิพลต่อระดับที่เหมาะสมในการสืบค้นของคุณตามการกำหนดค่าการค้นหาของคุณ นอกจากนี้เนื่องจากการค้นหา Rocchio ที่สร้างขึ้นสำหรับผู้ใช้แต่ละคนนั้นน่าจะเหมือนกันในเซสชันการค้นหาของผู้ใช้ (ขึ้นอยู่กับกรณีการใช้งานของคุณ) วิธีที่มีประสิทธิภาพมากขึ้นในการใช้สิ่งนี้เพื่อทำสิ่งนี้เป็นส่วนตัวเพียงแค่ใช้ตัวจัดการ RF เพื่อสร้างแบบสอบถาม Rochio ตัวจัดการส่งคืนแบบสอบถาม Rocchio ในพารามิเตอร์ RF.Query ในการตอบสนอง หากคุณต้องการใช้ตัวจัดการเพื่อรับแบบสอบถาม (และไม่ดำเนินการค้นหา) คุณสามารถตั้งค่าพารามิเตอร์แถวเป็น 0 คุณสามารถวนซ้ำชุดของ 'คำที่น่าสนใจ' ที่ส่งคืนโดยอัลกอริทึมพร้อมกับน้ำหนักของพวกเขาหากคุณตั้งค่า RF.interestingterms = รายละเอียด

การปรับปรุงที่เป็นไปได้

นอกเหนือจากการสร้างความมั่นใจว่างานนี้กับ SOLR เวอร์ชันเพิ่มเติม (โปรดแสดงความคิดเห็นว่าเวอร์ชันใดที่คุณต้องการ) มีการปรับปรุงที่เป็นไปได้จำนวนมาก:

ตัวจัดการข้อเสนอแนะที่เกี่ยวข้อง ช่วยให้การเรียนรู้คำศัพท์เชิงลบจากตัวอย่างเชิงลบ (หากให้มา - ต้องการพารามิเตอร์การสืบค้นแยกต่างหาก) จากนั้นใช้งานโดยใช้การเพิ่มค่าลบ การปรับปรุงอื่นคือการอนุญาตให้ระบุข้อกำหนดสูงสุดต่อฟิลด์ (rf.maxflqt) ที่ระบุไว้บนพื้นฐานต่อฟิลด์เพื่อให้คุณสามารถเปลี่ยนแปลงจำนวนคำสูงสุดที่สกัดด้วยฟิลด์
ข้อเสนอแนะที่ไม่ได้รับการสนับสนุน (ข้อเสนอแนะตาบอด) ใช้ โมเดลความเกี่ยวข้องตำแหน่ง ที่มีรายละเอียดในบทความนี้: http://dl.acm.org/citation.cfm?id=1835546 สิ่งนี้ใช้คำศัพท์เฉพาะที่พบใกล้ข้อกำหนดของแบบสอบถามในเอกสารเนื่องจากโดยทั่วไปแล้วจะมีความเกี่ยวข้องมากกว่าการใช้เอกสารทั้งหมด ส่วนประกอบปากกาเน้นข้อความสามารถใช้เป็นข้อมูลอ้างอิงเพื่อกำหนดวิธีรับข้อมูลนี้จากรายการโพสต์หรืออาจใช้โดยตรงเพื่อรับข้อมูลนี้

รายละเอียดการติดต่อ

หากคุณมีคำขอคุณสมบัติโปรดส่งไปยังรายการปัญหา หากคุณมีคำถามนั่นเป็นสถานที่ที่ดีในการโพสต์ แต่คุณสามารถติดต่อฉันได้ที่ [email protected] หากคุณไม่ได้กลับมาที่นี่

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท ซอร์สโค้ดอื่น ๆ
เวลาอัปเดต 2025-03-12
ขนาด 108.25KB
มาจาก Github

แอปที่เกี่ยวข้อง

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด