This article does not involve the specific principles of ElasticSearch, but only records how to quickly import data in mysql for full-text search.
At work, you need to implement a search function and import existing database data. The team leader recommends using ElasticSearch to implement it. You can search through tutorials online, which are relatively old articles. I have no choice but to explore it myself. Refer to the ES documentation and finally build the service. I will record it. I hope that friends with the same needs can avoid detours and can quickly build an available ElasticSearch service according to this tutorial.
ES construction
ES builds can directly download zip files and docker containers. Relatively speaking, docker is more suitable for us to run ES services. It is possible to easily build a cluster or create a test environment. The container method is also used here. First, we need a Dockerfile:
FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.0.0# Submit configuration Includes the new elasticsearch.yml and keystore.jks files COPY --chown=elasticsearch:elasticsearch conf/ /usr/share/elasticsearch/config/# Install ikRUN ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip# Install readonlyrestRUN ./bin/elasticsearch-plugin install https://github.com/HYY-yu/BezierCurveDemo/raw/master/readonlyrest-1.16.14_es6.0.0.zipUSER elasticsearchCMD ./bin/elasticsearch
Here is an explanation of the above operation:
elictic configuration elasticsearch.yml
cluster.name: "docker-cluster"network.host: 0.0.0.0# minimum_master_nodes need to be explicitly set when bound on a public IP# set to 1 to allow single node clusters# Details: https://github.com/elastic/elasticsearch/pull/17288discovery.zen.minimum_master_nodes: 1# Forbid the system to exchange memory on ES bootstrap.memory_lock: true http.type: ssl_netty4readonlyrest: enable: true ssl: enable: true keystore_file: "server.jks" keystore_pass: server key_pass: server access_control_rules: - name: "Block 1 - ROOT" type: allow groups: ["admin"] - name: "User read only - paper" groups: ["user"] indices: ["paper*"] actions: ["indices:data/read/*"] users: - username: root auth_key_sha256: cb7c98bae153065db931980a13bd45ee3a77cb8f27a7dfee68f686377acc33f1 groups: ["admin"] - username: xiaoming auth_key: xiaoming: xiaoming groups: ["user"]
Here bootstrap.memory_lock: true is a pit, prohibiting the exchange of memory. The document has already explained that some os will swap temporarily unused memory to an area of the hard disk during runtime. However, this behavior will soar the resource utilization rate of ES, and even make the system unable to respond.
It is already obvious in the configuration file that a root user belongs to the admin group, and admin has all permissions. Because Xiaoming is in the user group, he can only access the paper index, and can only read, but cannot operate. For more detailed configuration, please see: readonlyrest documentation
At this point, the preparations for ES have been completed. Docker build -t ESImage:tag. Docker run -p 9200:9200 ESImage:Tag run.
If https://127.0.0.1:9200/ returns
{ "name": "VaKwrIR", "cluster_name": "docker-cluster", "cluster_uuid": "YsYdOWKvRh2swz907s2m_w", "version": { "number": "6.0.0", "build_hash": "8f0685b", "build_date": "2017-11-10T18:41:22.859Z", "build_snapshot": false, "lucene_version": "7.0.1", "minimum_wire_compatibility_version": "5.6.0", "minimum_index_compatibility_version": "5.0.0" }, "tagline": "You Know, for Search"}The protagonist of our tutorial has appeared. I will share several commonly used APIs to tease and debug ES:
{{url}} is replaced with your local ES address.
Import MYSQL data
I am using MYSQL data here, but in fact, other databases are the same. The key is how to import. The online tutorial will recommend the mysql plug-in for Logstash, Beat, and ES for import. I have also tried it. The configuration is cumbersome and the documents are sparse. If the database structure is a bit complicated, importing is a laborious task, so it is not recommended. In fact, ES has corresponding API libraries in each language. You can assemble the data into json at the language level and send it to ES through the API library. The process is roughly as follows:
I use Golang's ES library elastic. You can search on github for other languages, and the operation method is the same.
Next, use a simple database to introduce it:
Paper Table
| id | name |
|---|---|
| 1 | Beijing No. 1 Primary School Simulation |
| 2 | Jiangxi Beijing General College Entrance Examination Questions |
Province table
| id | name |
|---|---|
| 1 | Beijing |
| 2 | Jiangxi |
Paper_Province table
| paper_id | province_id |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
As mentioned above, Paper and Province are many-to-many relationships. Now that Paper data is entered into ES, you can search fuzzyly by the Paper name, or filter through Province. The json data format is as follows:
{ "id":1, "name": "Beijing No. 1 Primary School Simulation Volume", "provinces":[ { "id":1, "name":"Beijing" } ]} First prepare a mapping.json file, which is the storage structure definition of data in ES.
{ "mappings":{ "docs":{ "include_in_all": false, "properties":{ "id":{ "type":"long" }, "name":{ "type":"text", "analyzer":"ik_max_word" // Use the largest word parterator}, "provinces":{ "type":"nested", "properties":{ "id":{ "type":"integer" }, "name":{ "type":"text", "index":"false" // Not indexed} } } } } } } } } } } }, "settings":{ "number_of_shards":1, "number_of_replicas":0 }}It should be noted that the _all field is cancelled. This default _all will collect all storage fields to achieve unconditional restricted search. The disadvantage is that the space occupies a lot.
I set the shard number to 1, and no replicas is set. After all, this is not a cluster and the data processed is not much. If there is a large amount of data that needs to be processed, you can set the number of shards and replicas by yourself.
First, establish a connection with ES, ca.crt is related to jks self-signature. Of course, here I am using InsecureSkipVerify to ignore the verification of the certificate file.
func InitElasticSearch() { pool := x509.NewCertPool() crt, err0 := ioutil.ReadFile("conf/ca.crt") if err0 != nil { cannotOpenES(err0, "read crt file err") return } pool.AppendCertsFromPEM(crt) tr := &http.Transport{ TLSClientConfig: &tls.Config{RootCAs: pool, InsecureSkipVerify: true}, } httpClient := &http.Client{Transport: tr} //Background construct elasticClient var err error elasticClient, err = elastic.NewClient(elastic.SetURL(MyConfig.ElasticUrl), elastic.SetErrorLog(GetLogger()), elastic.SetGzip(true), elastic.SetHttpClient(httpClient), elastic.SetSniff(false), // Cluster sniff, remember to close a single node. elastic.SetScheme("https"), elastic.SetBasicAuth(MyConfig.ElasticUsername, MyConfig.ElasticPassword)) if err != nil { cannotOpenES(err, "search_client_error") return } //ElasticClient construction is completed//Query whether there is a paper index exist, err := elasticClient.IndexExists(MyConfig.ElasticIndexName).Do(context.Background()) if err != nil { cannotOpenES(err, "exist_paper_index_check") return } //The index exists and passes the integrity check, no data is sent if exist { if !isIndexIntegrity(elasticClient) { //Delete the current index and prepare to rebuild deleteResponse, err := elasticClient.DeleteIndex(MyConfig.ElasticIndexName).Do(context.Background()) if err != nil || !deleteResponse.Acknowledged { cannotOpenES(err, "delete_index_error") return } } else { return } } //Background query database and send data to elasticsearch go fetchDBGetAllPaperAndSendToES()} type PaperSearch struct { PaperId int64 `gorm:"primary_key;column:F_paper_id;type:BIGINT(20)" json:"id"` Name string `gorm:"column:F_name;size:80" json:"name"` Provinces []Province `gorm:"many2many:t_paper_province;" json:"provinces"` // Provinces for which the test paper is applicable}func fetchDBGetAllPaperAndSendToES() { // fetch paper var allPaper []PaperSearch GetDb().Table("t_papers").Find(&allPaper) //province for i := range allPaper { var allPro []Province GetDb().Table("t_provinces").Joins("INNER JOIN `t_paper_province` ON `t_paper_province`.`province_F_province_id` = `t_provinces`.`F_province_id`).Where("t_paper_province.paper_F_paper_id = ?", allPaper[i].PaperId).Find(&allPro) allPaper[i].Provinces = allPro } if len(allPaper) > 0 { //send to es - create index createService := GetElasticSearch().CreateIndex(MyConfig.ElasticIndexName) // The index_default_setting here is the content in mapping.json above. createService.Body(index_default_setting) createResult, err := createService.Do(context.Background()) if err != nil { cannotOpenES(err, "create_paper_index") return } if !createResult.Acknowledged || !createResult.ShardsAcknowledged { cannotOpenES(err, "create_paper_index_fail") } // - send all paper bulkRequest := GetElasticSearch().Bulk() for i := range allPaper { indexReq := elastic.NewBulkIndexRequest().OpType("create").Index(MyConfig.ElasticIndexName).Type("docs"). Id(helper.Int64ToString(allPaper[i].PaperId)). Doc(allPaper[i]) bulkRequest.Add(indexReq) } // Do sends the bulk requests to Elasticsearch bulkResponse, err := bulkRequest.Do(context.Background()) if err != nil { cannotOpenES(err, "insert_docs_error") return } // Bulk request actions get cleared if len(bulkResponse.Created()) != len(allPaper) { cannotOpenES(err, "insert_docs_nums_error") return } //send success }} After running through the above code, use {{url}}/_cat/indices?v to see if the newly created index appears in ES, and use {{url}}/papers/_search to see how many documents hit. If the number of documents is equal to the amount of data you sent in the past, the search service will be considered running.
search
You can now search for test papers by ProvinceID and q, and the default is sorted by relevance score.
//q Search string provinceID Limited province id limit page pagination parameters func SearchPaper(q string, provinceId uint, limit int, page int) (list []PaperSearch, totalPage int, currentPage int, pageIsEnd int, returnErr error) { //If the conditions are not met, use the database to search if !CanUseElasticSearch && !MyConfig.UseElasticSearch { return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page) } list = make([]PaperSimple, 0) totalPage = 0 currentPage = page pageIsEnd = 0 returnErr = nil client := GetElasticSearch() if client == nil { return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page) } // There is a problem with ElasticSearch, use the database to search if !isIndexIntegrity(client) { return SearchPaperLocal(q, courseId, gradeId, provinceId, paperTypeId, limit, page) } if !client.IsRunning() { client.Start() } defer client.Stop() q = html.EscapeString(q) boolQuery := elastic.NewBoolQuery() // Paper.name matchQuery := elastic.NewMatchQuery("name", q) // Province if provinceId > 0 && provinceId != DEFAULT_PROVINCE_ALL { proBool := elastic.NewBoolQuery() tpro := elastic.NewTermQuery("provinces.id", provinceId) proNest := elastic.NewNestedQuery("provinces", proBool.Must(tpro)) boolQuery.Must(proNest) } boolQuery.Must(matchQuery) for _, e := range termQuerys { boolQuery.Must(e) } highlightt := elastic.NewHighlight() highlight.Field(ELASTIC_SEARCH_SEARCH_FIELD_NAME) highlight.PreTags(ELASTIC_SEARCH_SEARCH_FIELD_TAG_START) highlight.PostTags(ELASTIC_SEARCH_SEARCH_FIELD_TAG_END) searchResult, err2 := client.Search(MyConfig.ElasticIndexName). Highlight(highligt). Query(boolQuery). From((page - 1) * limit). Size(limit). Do(context.Background()) if err2 != nil { // Handle error GetLogger().LogErr("Error occurred while searching"+err2.Error(), "search_error") // Handle error returnErr = errors.New("Error occurred during search") } else { if searchResult.Hits.TotalHits > 0 { // Iterate through results for _, hit := range searchResult.Hits.Hits { var p PaperSearch err := json.Unmarshal(*hit.Source, &p) if err != nil { // Deserialization failed GetLogger().LogErr("Error occurred during search"+err.Error(), "search_deserialization_error") returnErr = errors.New("Error during searching") return } if len(hit.Highlight[ELASTIC_SEARCH_SEARCH_FIELD_NAME]) > 0 { p.Name = hit.Highlight[ELASTIC_SEARCH_SEARCH_FIELD_NAME][0] } list = append(list, p) } count := searchResult.TotalHits() currentPage = page if count > 0 { totalPage = int(math.Ceil(float64(count) / float64(limit))) } if currentPage >= totalPage { pageIsEnd = 1 } } else { // No hits } } return}The above is all the content of this article. I hope it will be helpful to everyone's learning and I hope everyone will support Wulin.com more.