RelevancyFeedback 다운로드 - RelevancyFeedback 소스 코드 다운로드

RelevancyFeedback

기타 소스코드

1.0.0

다운로드

주사위 관련 피드백

개인화 된 검색 및 권장 사항 (관련성 피드백 플러그인을 통한) 및 개념 / 시맨틱 검색 (감독되지 않은 피드백 플러그인을 통해)을 수행하기위한 Dice.com의 Solr 플러그인.

모래밭

대화에서 슬라이드
대화의 비디오

플러그인 구축

사전 구축 된 JAR 파일은 ./target 폴더에서 찾을 수 있습니다. 이 프로젝트에는 소스에서 빌드하는 데 사용할 수있는 Maven Pom.xml 파일이 포함되어 있습니다.

지원되는 Solr 버전

Solr 5.4 (지점 참조)
Solr 6.3 (지점 참조) 도 마스터합니다
Solr 7.0 (지점 참조) - 7.1에서도 작동합니다

이를 위해 필요한 특정 버전의 Solr이 있으면 GitHub 문제를 만들고 내가 할 수있는 일을 보겠습니다. 특정 버전을 위해 수동으로 컴파일하려면 Maven을 사용하여 POM.XML 파일을 사용하여 플러그인을 컴파일하고 해당 파일의 Solr 및 Lucene 라이브러리 버전을 업데이트하고 Maven을 사용하여 해당 종속성을 가져 오십시오. 그런 다음 컴파일 오류를 수정하십시오.

Solr로 가져 오기

Solr에 플러그인을 등록하기위한 공식 Solr 지침을 참조하십시오. 여기에는 기본적으로 Core Reload에서 클래스 및 JAR 파일을 검사하는 폴더 중 하나에 JAR 파일을 삭제하는 것이 포함됩니다.

Solr 플러그인
Solr Cloud에 사용자 정의 플러그인 추가

관련 피드백 플러그인

solrconfig.xml에 대한 예제 요청 핸들러 구성이 다음과 같습니다. 주석은 주된 매개 변수를 설명합니다.

 <requestHandler name="/rf" class="org.dice.solrenhancements.relevancyfeedback.RelevancyFeedbackHandler">
        <lst name="defaults">
            <str name="omitHeader">true</str>
            <str name="wt">json</str>
            <str name="indent">true</str>			
	
            <!-- Regular query configuration - query parser used when parsing the rf query-->
            <str  name="defType">lucene</str>
            
            <!-- fields returned -->
            <str  name="fl">jobTitle,skill,company</str>
            <!-- fields to match on-->
            <str  name="rf.fl">skillFromSkill,extractTitles</str>
            
            <!-- field weights. Note that term weights are normalized so that each field is weighted exactly in this ratio
            as different fields can get different numbers of matching terms-->
            <str  name="rf.qf">skillFromSkill^3 extractTitles^4.5</str>
            
            <int  name="rows">10</int>
            
            <!-- How many terms to extract per field (max) -->
            <int  name="rf.maxflqt">10</int>
            <bool name="rf.boost">true</bool>
            
            <!-- normalize the weights for terms in each field (custom to dice rf, not present in solr MLT) -->
            <bool name="rf.normflboosts">true</bool>
            
            <!-- Take the raw term frequencies (false) or log of the term frequenies (true) -->
            <bool name="rf.logtf">true</bool>
            
            <!-- Minimum should match settings for the rf query - determines what proportion of the terms have to match -->
            <!-- See Solr edismax mm parameter for specifics --> 
            <bool name="rf.mm">25%</bool>            
            
            <!-- Returns the top k terms (see regular solr MLT handler) -->
            <str  name="rf.interestingTerms">details</str>
            
            <!-- Turns the rf query into a boost query using a multiplicative boost, allowing for boosting -->
            <str  name="rf.boostfn"></str>
            
            <!-- q parameter -  If you want to execute one query, and use the rf query to boost the results (e.g. for personalizing search), 
            	 pass the user's query into this parameter. Can take regular query syntax e.g.rf.q={!edismax df=title qf=.... v=$qq}&qq=Java
            	 The regular q parameter is reserved for the rf query (see abpve)
            -->
            <str name="q"></str>
            <!-- Query parser to use for the q query, if passed -->
            <str name="defType"></str>

            <!-- rf.q parameter - Note that the regular q parameter is used only for personalized search scenarios, where you have a main query
            	 and you want to use the rf query generated to boost the main queries documents. Typically the rf.q query is a query that identifies
            	 one or more documents, e.g. rf.q=(id:686867 id:98928980 id:999923). But it cam be any query. Note that it will process ALL documents
            	 matched by the q query (as it's intended mainly for looking up docs by id), so use cautiously.
            -->
            <str name="rf.q"></str>
            <!-- query parser to use for the rf.q query -->
            <str name="rf.defType"></str>
            
            <!-- Settings for personalized search - use the regular parameter names for the query parser defined by defType parameter -->
            <str name="df">title</str>
            <str name="qf"> company_text^0.01 title^12 skill^4 description^0.3</str>
            <str name="pf2">company_text^0.01 title^12 skill^4 description^0.6</str> 
            
            <!-- Content based recommendations settings (post a document to the endpoint in a POST request). The stream.body and stream.head are form parameters
                 You can send them in a GET request, but a POST handles larger data. If you have really large documents, you will need to change the buffer settings
                 so that the request doesn't blow the buffer limits in Solr or your web server.
             -->
            
            <!-- Fields used for processing documents posted to the stream.body and stream.head parameters in a POST call -->
            <str  name="stream.head.fl">title,title_syn</str>
            <str  name="stream.body.fl">extractSkills,extractTitles</str>
            
            <!-- pass a url in this parameter for Solr to download the webpage, and process the Html using the fields configured in the stream.qf parameters -->
            <str  name="stream.url"></str>
            
            <!-- Note that we have two different content stream fields to pass over in the POST request. This allows different analyzers to be appkied to each. 
            For instance, we pass the job title into the stream.head field and parse out job titles, while we pass the job description to the stream.head parameter 
            to parse out skills -->
            <!-- Pass the document body in this parameter as a form parameter. Analysed using the stream.body.fl fields-->
            <str  name="stream.body"></str>
            <!-- Pass the second document field in this parameter. Analysed using the stream.head.fl fields-->
            <str  name="stream.head"></str>
            
            <!-- Specifies a separate set of field weights to apply when procesing a document posted to the request handler via the 
                 stream.body and stream.head parameters -->
            <str  name="stream.qf">extractSkills^4.5 extractTitles^2.25 title^3.0 title_syn^3.0</str>           
        </lst>
</requestHandler>

예제 요청

http : // localhost : 8983/solr/jobs/rf? q = q = ID : 11F407D319D6CC707437FAD874A097C0+ID : A2FD2F2E 34667d61fadcdcabfd359cf4 & rows = 10 & df = title & fl = title, skills, geocode, city, state & wt = json

예제 응답

 {
  "match":{
      "numFound":2,
      "start":0,
      "docs":[
          {
            "id":"a2fd2f2e34667d61fadcdcabfd359cf4",        
            "title":"Console AAA Sports Video Game Programmer.",
            "skills":["Sports Game Experience a plus.",
              "2-10 years plus Console AAA Video Game Programming Experience"],
            "geocode":"38.124447,-122.55051",
            "city":"Novato",
            "state":"CA"
          },
          {
            "id":"11f407d319d6cc707437fad874a097c0",
            "title":"Game Engineer - Creative and Flexible Work Environment!",
            "skills":["3D Math",
              "Unity3d",
              "C#",
              "3D Math - game programming",
              "game programming",
              "C++",
              "Java"],
            "geocode":"33.97331,-118.243614",
            "city":"Los Angeles",
            "state":"CA"
          }
      ]
  },
  "response":{
      "numFound":5333,
      "start":0,
      "docs":[
          {
            "title":"Software Design Engineer 3 (Game Developer)",
            "skills":["C#",
              "C++",
              "Unity"],
            "geocode":"47.683647,-122.12183",
            "city":"Redmond",
            "state":"WA"
          },
          {          
            "title":"Game Server Engineer - MMO Mobile Gaming Start-Up!",
            "skills":["AWS",
              "Node.JS",
              "pubnub",
              "Websockets",
              "pubnub - Node.JS",
              "Vagrant",
              "Linux",
              "Git",
              "MongoDB",
              "Jenkins",
              "Docker"],
            "geocode":"37.777115,-122.41733",
            "city":"San Francisco",
            "state":"CA"
          },...
      ]
   }
}

감독되지 않은 피드백 (블라인드 피드백) 플러그인

solrconfig.xml에 대한 예제 요청 핸들러 구성이 다음과 같습니다. 주석은 주된 매개 변수를 설명합니다.

 <requestHandler name="/ufselect" class="org.dice.solrenhancements.unsupervisedfeedback.UnsupervisedFeedbackHandler">
        <lst name="defaults">
            <str name="omitHeader">true</str>
            <str name="wt">json</str>
            <str name="indent">true</str>

            <!-- Regular query configuration -->
            <str  name="defType">edismax</str>
            <str  name="df">title</str>
            <str  name="qf">title^1.5   skills^1.25 description^1.1</str>
            <str  name="pf2">title^3.0  skills^2.5  description^1.5</str>
            <str  name="mm">1</str>
            <str  name="q.op">OR</str>

            <str  name="fl">jobTitle,skills,company</str>
            <int  name="rows">30</int>
                        
            <!-- Unsupervised Feedback (Blind Feedback) query configuration-->
            <str  name="uf.fl">skillsFromskills,titleFromJobTitle</str>
            <!-- How many docs to extract the top terms from -->
            <str  name="uf.maxdocs">50</str>
            <!-- How many terms to extract per field (max) -->
            <int  name="uf.maxflqt">10</int>
            <bool name="uf.boost">true</bool>
            <!-- Relative per-field boosts on the extracted terms (similar to edismax qf parameter -->
            <!-- NOTE: with  uf.normflboosts=true, all terms are normalized so that the total importance of each
            	field on the query is the same, then these relative boosts are applied per field-->
            
            <str  name="uf.qf">skillsFromskills^4.5 titleFromJobTitle^6.0</str>
            
            <!-- Returns the top k terms (see regular solr MLT handler) -->
            <str  name="uf.interestingTerms">details</str>
			
            <!-- unit-length norm all term boosts within a field (recommended) - see talk for details -->
            <bool name="uf.normflboosts">true</bool>
            <!-- use raw term clounts or log term counts? -->
            <bool name="uf.logtf">false</bool>
        </lst>
</requestHandler>

예제 요청

http : // localhost : 8983/solr/dicejobscp/ufselect? q = machine+learning+엔지니어 & start = 0 & rows = 10 & uf.logtf = false & fl = 제목, 기술, 지오 코드, 도시, 주 및 fq = {! geofilt+sfield = jobendecageocode+d = 48+pt = 39.6955, -105.0841} & wt = json

예제 응답

 {
  "match":
  {
    "numFound":7729,
    "start":0,
    "docs":[
      {
        "title":"NLP/Machine Learning Engineer",
        "skills":["Linux",
          "NLP (Natural Language Processing)",
          "SQL",
          "Bash",
          "Python",
          "ML (Machine Learning)",
          "JavaScript",
          "Java"],
        "geocode":"42.35819,-71.050674",
        "city":"Boston",
        "state":"MA"
      },
      {
        "title":"Machine Learning Engineer",
        "skills":["machine learning",
          "java",
          "scala"],
        "geocode":"47.60473,-122.32594",
        "city":"Seattle",
        "state":"WA"
      },
      {
        "title":"Machine Learning Engineer - REMOTE!",
        "skills":["Neo4j",
          "Hadoop",
          "gensim",
          "gensim - C++",
          "Java",
          "R",
          "MongoDB",
          "elastic search",
          "sci-kit learn",
          "Python",
          "C++"],
        "geocode":"37.777115,-122.41733",
        "city":"San Francisco",
        "state":"CA"
        },...
    ]
}

이것은 단지 MLT 핸들러가 아닌가?

Solr MLT 핸들러 코드 및 알고리즘 (Rocchio 알고리즘)을 기반으로 느슨하게 사용되지만 알고리즘 설계에는 몇 가지 주요 차이점이 있습니다. MLT 핸들러는 MLT 쿼리를 구성 할 때 구성된 모든 필드에서 상단 K 용어를 취합니다. 다른 분야보다 어휘가 더 넓은 필드가있는 경우, 용어의 평균 문서 빈도는 어휘가 작은 다른 분야보다 낮습니다. 이는이 용어의 상대 IDF 점수가 높고 Solr MLT 핸들러가 선택한 최상위 항을 지배하는 경향이 있음을 의미합니다. 우리의 요청 핸들러는 필드 당 상위 K 용어를 취합니다. 또한 필드 당 얼마나 많은 용어가 일치하는지 (구성된 한계까지) 해당 필드는 RF.QF 매개 변수에 지정된 필드 특정 가중치가 적용되기 전에 다른 모든 필드와 동일한 쿼리에서 가중치를 갖습니다. 이것은 우리가 해결하는 Solr MLT 핸들러의 두 번째 문제입니다. 우리는 또한 많은 추가 기능을 제공합니다. 우리는 컨텐츠 스트림을 전달하여 여러 문서와 일치하여 ( 'this'와는 달리 'this'와 비슷한 것과 비슷 함), 부스트 쿼리 파서를 결과 MLT 쿼리에 적용하여 임의의 Solr Boost를 적용 할 수 있도록 (곱셈)을 적용 할 수 있습니다. 또한 MM 매개 변수를 지원하므로 최상위 항의 세트 % 만 일치하는 문서가 다시 돌아 오도록 강요 할 수 있습니다.

개인화 된 검색에 사용될 때 중요한 고려 사항

내 Lucene Revolution 2017 토크에서 보여준 것처럼 검색 개인화를 수행하기 위해 이것을 사용하려면, 일반 Q 매개 변수를 사용하여 사용자의 현재 검색 쿼리를 전달해야하며, rocchio 쿼리를 생성하는 데 사용되는 정보는 Rf.Q 매개 변수 (문서를 사용하여 Rocchio 쿼리를 사용하여 RF.Stream.head 및 rf.Stream)를 통해 전달됩니다. 그러나 Rocchio 쿼리의 용어에 적용되는 부스트는 알고리즘이 적용되는 정규화 프로세스로 인해 사용자 쿼리의 비교 가중치가 아닙니다. 따라서 검색 구성에 따라 쿼리에 올바른 영향을 줄 때까지 다른 RF.QF 값을 실험해야합니다. 또한 각 사용자에 대해 생성 된 Rocchio 쿼리가 사용자의 검색 세션에서 (물론 사용 사례에 따라) 개인화를 수행하는 데 사용하는보다 효과적인 방법은 단순히 RF 처리기를 사용하여 사용자가 로그인 할 때 한 번 Rochio 쿼리를 생성 한 다음이 쿼리를 캐시하고 (정기적 인 검색을 요청할 때) 개인화를위한 부스트 query로 사용하는 것입니다. 핸들러는 응답의 rf.query 매개 변수에서 Rocchio 쿼리를 반환합니다. 핸들러를 사용하여 쿼리를 가져 오기 위해 (검색을 실행하지 않음) 행 매개 변수를 0으로 설정할 수 있습니다. rf.interestingterms = 세부 사항을 설정하고이를 사용하여 부스트 쿼리를 구축하는 경우 가중치와 함께 알고리즘에 의해 반환 된 '흥미로운 용어'세트를 반복 할 수도 있습니다.

잠재적 인 향상

더 많은 버전의 Solr과 함께 작동하는 것 외에도 (모두 원하는 버전에 대한 피드백을 남겨주십시오), 가능한 여러 가지 향상이 있습니다.

관련 피드백 핸들러는 부정적인 예제 (제공되는 경우 별도의 쿼리 매개 변수가 필요한 경우)에서 부정적인 용어를 학습 한 다음 부정적인 부스트를 사용하여 구현할 수 있습니다. 또 다른 향상은 필드 당 최대 항 (Rf.maxflqt)을 필드별로 지정하여 필드별로 추출한 최대 용어를 변경할 수 있도록하는 것입니다.
감독되지 않은 피드백 (블라인드 피드백) 이 백서에 자세히 설명 된 위치 관련성 모델을 사용하십시오 : http://dl.acm.org/citation.cfm?id=1835546. 이것은 문서에서 쿼리 용어 근처에서 발견되는 용어 만 사용합니다.이 용어는 일반적으로 전체 문서를 사용하는 것보다 더 관련성이 높기 때문입니다. 형광펜 구성 요소는 아마도 게시물 목록 에서이 정보를 얻는 방법을 결정하기 위해 참조로 사용 하거나이 정보를 얻기 위해 직접 사용될 수도 있습니다.

연락처 세부 사항

기능 요청이 있으면 문제 목록에 제출하십시오. 궁금한 점이 있으면 게시하기에도 좋은 곳이지만 여기에 돌아 오지 않으면 [email protected]으로 연락 할 수도 있습니다.

확장하다

추가 정보

버전 1.0.0
유형 기타 소스코드
업데이트 시간 2025-03-12
크기 108.25KB
출처 Github

RelevancyFeedback

주사위 관련 피드백

모래밭

플러그인 구축

지원되는 Solr 버전

Solr로 가져 오기

관련 피드백 플러그인

예제 요청

예제 응답

감독되지 않은 피드백 (블라인드 피드백) 플러그인

예제 요청

예제 응답

이것은 단지 MLT 핸들러가 아닌가?

개인화 된 검색에 사용될 때 중요한 고려 사항

잠재적 인 향상

연락처 세부 사항

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express