FunpySpiderSearchEngine
1.0.0
Word2vec personalized search implementation + Scrapy2.3.0 (crawl data) + ElasticSearch7.9.1 (storing data and providing external Restful API) + Django3.1.1 search
This warehouse is the ElasticSearch code for crawler data storage . To realize the entire search, it is necessary to combine the Django website project https://github.com/mtianyan/mtianyanSearch
For the entire process of word2vec model training, please check the use of README word2vec in Word2VecModel, which affects the scoring of ElasticSearch. Please check the relevant code in mtianyanSearch.
Core scoring code:
"source": "double final_score=_score;int count=0;int total = params.title_keyword.size();while(count < total) { String upper_score_title = params.title_keyword[count]; if(doc['title_keyword'].value.contains(upper_score_title)){final_score = final_score+_score;}count++;}return final_score;"
The score is doubled for each title.
git clone https://github.com/mtianyan/FunpySpiderSearchEngine
# 修改config_template中配置信息后重命名为config.py
# 执行 sites/zhihu/es_zhihu.py
cd FunpySpiderSearchEngine
pip install -r requirements.txt
scrapy crawl zhihu
docker network create search-spider
git clone https://github.com/mtianyan/mtianyanSearch.git
cd mtianyanSearch
docker-compose up -d
git clone https://github.com/mtianyan/FunpySpiderSearchEngine
cd FunpySpiderSearchEngine
docker-compose up -d
Visit 127.0.0.1:8080
If my project code is helpful to you, please give me a bun of spicy strips!