Implement automated static site search - high performance/real-time search/regular syntax support/asynchronous/web worker/automatic continuous construction/expandable data format/custom front-end style
You can try this search feature directly in my blog to experience its high performance.
Run search.js
at build time, which indexes all posts on the site and generates an index file search.json
.
When used by the client, the browser will request the generated search.json
and establish a connection with Web Worker search.worker.js
.
During search, the search terms and index data will be sent to Worker and processed, and Worker returns a JSON format search result.
After that, the front-end process the returned results and display them.
Note: You need to adjust the script according to your own page structure first
Example: If your article directory structure is as follows:
......
├── articles
│ ├── 20200816
│ │ ├── index.html
│ ├── 20210701
│ │ └── index.html
│ ├── 20210705
│ │ └── index.html
│ ├── 20210719
│ │ └── index.html
│ ├── 20220206
│ │ └── index.html
......
It needs to be configured in search.js
like this:
// 配置区
const articlesFolder = "../articles/" ; // 文章根目录
const articlesFileName = "index.html" ; // 文档文件名
const savePath = "../assets/data/search.json" ;
This will match something like this: ../articles/*/index.html
RPageSearch will list all directories in articlesFolder
by default and check whether it contains articlesFileName
, and if so, it will be parsed.
Among them, savePath
defines the saving path for generating data, which will be used later.
Example: If your article document structure is as follows:
< article >
< div id =" articles-header " >
< h2 > < a href =" /articles/20221224/ " > HikvisionIP摄像头后台绕过</ a > </ h2 > < p class =" articles-info " >
< time > 2022-12-24 </ time > • < span class =" i_small ri:archive-line " > </ span > < span class =" class " > < a >网络安全</ a > / < a >技术</ a > </ span > • < span class =" i_small ri:t-box-line " > </ span > < span id =" textLength " > ----字</ span > • < span class =" i_small ri:search-eye-line " > </ span > < span id =" pageVisitors " > --- </ span >
</ p >
< p class =" articles-tags " >
< span class =" i_small ri:price-tag-3-line " > </ span > < a > ATTACK </ a > < a > JSON </ a > < a > MONITOR </ a >
</ p >
< hr >
</ div >
< div id =" articles-body " >
< h3 >标题</ h3 > < p >
正文内容
</ p >
< br > < br > < h3 >标题</ h3 > < p >
正文
</ p >
< h3 >标题</ h3 > < p >
正文
</ p >
</ div >
< div id =" articles-footer " >
略
</ div >
</ article >
You can configure it in search.js
like this:
// 选择器
const articlesName = "#articles-header h2 a" ; // 文章标题元素
const articlesUrl = "#articles-header h2 a" ; // 文章链接元素
const articlesTime = "#articles-header .articles-info time" ; // 文章时间元素
const articlesClass = "#articles-header .articles-info .class a" ; // 文章分类元素
const articlesTag = "#articles-header .articles-tags a" ; // 文章标签元素
const articlesBody = "#articles-body" ; // 文章正文元素
const articlesImages = "#articles-body img" ; // 文章图片元素
const articlesLinks = "#articles-body a" ; // 文章外链元素
const articlesTitle =
"#articles-body h2 , #articles-body h3 , articles-body h4 , articles-body h5 , articles-body h6" ; // 文章小标题元素
Here you can use CSS selector at will, refer to the above.
Install dependencies and execute search.js
npm install
node search.js
As expected, search.json
used to save the result will be generated normally in the previously defined path savePath
.
This file is used to communicate with search.worker.js. Its format generally does not need to be adjusted. It is for all indexed objects, one of which is the format as follows:
{
"name" : " Minecraft反转药水效果" ,
"url" : " /articles/20200816/ " ,
"time" : " 2020-08-16 " ,
"class" : [
"游戏"
],
"tag" : [
" minecraft " ,
" json "
],
"title" : [
"原理" ,
"效果" ,
"命令"
],
"context" : "文章内容,略"
"img" : [
" /articles/20200816/show1.gif " ,
" /articles/20200816/show2.gif " ,
" /articles/20200816/show3.gif "
],
"links" : []
}
You need to customize it in frontEnd.js to fit your style. For detailed examples, you can refer to my blog or RTheme search.worker.js
generally does not require adjustment. It returns the following structure when searching:
[
{
"name" : "论静态页中伪动态的实现" ,
"url" : " /articles/20220708/ " ,
"time" : " 2022-07-08 " ,
"class" : [
"技术" ,
"设计"
],
"tag" : [
" javascript " ,
" html " ,
" python "
],
"title" : [
"前言" ,
"效果" ,
"实现方法" ,
"页面自动构建"
],
"context" : "文章内容,这里略"
"img" : [
" /articles/20220708/show1.gif " ,
" /articles/20220708/show2.png " ,
" /articles/20220708/code3.png "
],
"links" : [
" ?text=123&?name=aaa#锚点" ,
" https://ravelloh.github.io/PSGameSpider/ " ,
" https://ravelloh.github.io/EverydayNews/ " ,
" https://github.com/RavelloH/PSGameSpider " ,
" ?text=123&?name=aaa#实现方法" ,
" https://docs.github.com/cn/actions " ,
" https://github.com/RavelloH/PSGameSpider " ,
" https://github.com/RavelloH/EverydayNews "
],
"match" : [
[
" context " ,
25 ,
118
],
[
" links " ,
1
]
],
"matchTimes" : 25
},
{
"name" : "静态站实现全站搜索" ,
"url" : " /articles/20220825/ " ,
"time" : " 2022-08-25 " ,
"class" : [
"技术" ,
"设计"
],
"tag" : [
" javascript " ,
" json " ,
" python "
],
"title" : [
"前言" ,
"效果" ,
"实现方式" ,
"代码分析" ,
"后言"
],
"context" : "文章内容,这里略"
"img" : [
" /articles/20220825/show1.jpg "
],
"links" : [
" /articles/ " ,
" https://github.com/ravelloh/ravelloh.github.io " ,
" https://github.com/ravelloh/RPageSearch "
],
"match" : [
[
" context " ,
7 ,
103
],
[
" links " ,
1
]
],
"matchTimes" : 7
}
]
The returned object contains all the contents of the index and has been sorted by relevance, and its matches are wrapped in the <mark></mark>
tag. In addition, a special value match
is included to represent all matching result data.
match
defines a result list, the first item of each item represents the matching attribute and the second item represents the sort of matching attributes among all attributes.
Among them, the context
attribute is slightly different from other attributes. Its second term is the number of matches, and the third term is the character position that matches the first time.
The last item of match
is matchTimes
, indicating the number of matches. If there is a title match, this value will be 999999
to ensure it is topped in the result.
Because you are using worker communication, you can refer to the search
function in frontEnd.js
to configure your results for display.
In the example, the core implementation of search is as follows:
function getSearchData ( ) {
if ( typeof searchData == "undefined" ) {
return new Promise ( ( resolve , reject ) => {
fetch ( dataFilePath , { } )
. then ( ( response ) => response . json ( ) )
. then ( ( data ) => {
searchData = data ;
resolve ( data ) ;
} ) ;
} ) . catch ( ( err ) => {
throw err ;
} ) ;
} else {
return Promise . resolve ( searchData ) ;
}
}
function search ( keyword ) {
let start = new Date ( ) . getTime ( ) ;
if ( keyword == "" || keyword == "." ) {
return false ;
}
searchWord = HTMLDecode ( keyword ) ;
getSearchData ( ) . then ( ( data ) => {
if ( typeof searchWorker == "undefined" ) {
searchWorker = new Worker ( workerPath ) ;
}
searchWorker . onmessage = ( result ) => {
let end = new Date ( ) . getTime ( ) ;
let data = result . data ;
console . log ( `查询操作用时${ end - start } MS` ) ;
if ( data . length == 0 ) {
console . log ( "未找到有关选项" ) ;
return false ;
}
let resultHTML = "" ;
data . forEach ( ( e , index ) => {
resultHTML += structureSearchResult ( e ) ;
} ) ;
console . log ( resultHTML ) ; // 结果输出
} ;
searchWorker . postMessage ( [ data , searchWord ] ) ;
} ) ;
}
This is actually a simplified version of the related components in RTheme. It uses getSearchData()
method to return a Promise and request data and return asynchronously when there is a local cache.
The most important functional implementation is within the definition of the function search()
, which escapes the search terms and communicates with Worker, passing each item of the return result to the new function structureSearchResult()
to build standard HTML output. The output format in the example is as follows:
function structureSearchResult ( result ) {
let cla = "" ;
let tag = "" ;
let match = "" ;
let matchItem ,
matchNum ,
matchContext ,
matchTag ,
matchClass ,
matchTitle ,
matchLinks ;
let strLinks = "" ;
result . match . forEach ( ( e ) => {
if ( typeof e == "object" ) {
matchItem = e [ 0 ] ;
} else {
matchItem = e ;
}
switch ( matchItem ) {
case "name" :
match += "标题 / " ;
break ;
case "context" :
matchNum = e ;
match += `内容( ${ e [ 1 ] }次) / ` ;
break ;
case "title" :
match += "章节 / " ;
matchTitle = e ;
break ;
case "tag" :
match += "标签 / " ;
matchTag = e ;
break ;
case "class" :
match += "分类 / " ;
matchClass = e ;
break ;
case "links" :
match += "外链 / " ;
matchLinks = e ;
break ;
}
} ) ;
result . class . forEach ( ( e , index ) => {
if ( typeof matchClass !== "undefined" && matchClass [ 1 ] == index ) {
cla += `<a href="#/classification/ ${ e } " class='active'> ${ e } </a>/` ;
} else {
cla += `<a href="#/classification/ ${ e } "> ${ e } </a>/` ;
}
} ) ;
cla = cla . substring ( 0 , cla . length - 1 ) ;
result . tag . forEach ( ( e , index ) => {
if ( typeof matchTag !== "undefined" && matchTag [ 1 ] == index ) {
tag += `<a href="#/tag/ ${ e } " class='active'> ${ e } </a>` ;
} else {
tag += `<a href="#/tag/ ${ e } "> ${ e } </a>` ;
}
} ) ;
match = match . substring ( 0 , match . length - 3 ) ;
if ( typeof matchNum == "undefined" || matchNum [ 2 ] < 10 ) {
matchContext = result . context . substring ( 0 , 150 ) ;
} else {
matchContext = result . context . substring (
matchNum [ 2 ] - 10 ,
matchNum [ 2 ] + 140 ,
) ;
}
result . links . forEach ( ( e , index ) => {
if ( typeof matchLinks !== "undefined" && matchLinks [ 1 ] == index ) {
strLinks = `<a class='search-result-links one-line' href=' ${ e } '><span class='i_small ri:link'></span> ${ e } </a>` ;
}
} ) ;
return `
<div class="loaded listprogram">
<article>
<span class="article-name">
<h4><a href=" ${ result . url } "> ${ result . name } </a></h4>
</span>
<p class="articles-info">
<span class='search-result-tags'> ${ match } </span>
<time> ${ result . time } </time> • <span class="i_small ri:archive-line"></span>
<span class="class">
${ cla }
</span>
<div class='search-result-context'><span class='i_small ri:file-list-2-line'></span> ... ${ matchContext } </div>
${ strLinks }
</p>
<p class="articles-tags">
${ tag }
</p>
</article>
<hr>
</div>
` ;
}
You can customize your output template based on this using result.url
result.name
match
result.time
cla
matchContext
strLinks
tag
and other result strings. Among them, matchContext
will return a total of 150 characters before and after the result. You can optimize its style by modifying CSS, or directly refer to RTheme
After the initialization is successful, you can use github Workflow to build automatically when there are changes. Add a .github/workflow/search.xml
, the example is as follows:
name : RPageSearch
on :
push :
paths :
- ' articles/** '
workflow_dispatch :
permissions :
contents : write
pages : write
id-token : write
jobs :
build :
runs-on : ubuntu-latest #运行环境
steps :
- name : Checkout
uses : actions/checkout@v2
- name : ' Git set '
run : |
git init
git pull
- name : ' Set up nodejs '
uses : actions/setup-node@v3
with :
node-version : ' latest '
- name : ' Install requirements '
run : |
npm install
- name : ' Working '
run : |
node search.js
- name : Commit files
continue-on-error : True
run : |
git diff
git config --local user.email "[email protected]"
git config --local user.name "Ravello-H"
git add -A
git commit -m "[RPageSearch]`date '+%Y-%m-%d %H:%M:%S'`" || exit
git status
git push -f
Because it can provide an index file, you can generate other files based on it. Refer to the following items:
https://github.com/RavelloH/local-feed-generation
https://github.com/cheeriojs/cheerio