FunpySpiderSearchEngine
1.0.0
Word2vec 個性化搜索實現+Scrapy2.3.0(爬取數據) + ElasticSearch7.9.1(存儲數據並提供對外Restful API) + Django3.1.1 搜索
本倉庫為爬蟲端數據入庫ElasticSearch代碼,實現整個搜索需要結合Django網站端項目https://github.com/mtianyan/mtianyanSearch
word2vec 模型訓練全過程請查看項目Word2VecModel 中README word2vec 使用,影響ElasticSearch打分,請查看mtianyanSearch中相關代碼
核心打分代碼:
"source": "double final_score=_score;int count=0;int total = params.title_keyword.size();while(count < total) { String upper_score_title = params.title_keyword[count]; if(doc['title_keyword'].value.contains(upper_score_title)){final_score = final_score+_score;}count++;}return final_score;"
標題每包含一個相關詞,分數加倍
git clone https://github.com/mtianyan/FunpySpiderSearchEngine
# 修改config_template中配置信息后重命名为config.py
# 执行 sites/zhihu/es_zhihu.py
cd FunpySpiderSearchEngine
pip install -r requirements.txt
scrapy crawl zhihu
docker network create search-spider
git clone https://github.com/mtianyan/mtianyanSearch.git
cd mtianyanSearch
docker-compose up -d
git clone https://github.com/mtianyan/FunpySpiderSearchEngine
cd FunpySpiderSearchEngine
docker-compose up -d
訪問127.0.0.1:8080
如果我的項目代碼對你有幫助,請我吃包辣條吧!