Librarian
1.0.0

我会发现您脑海中的内容,但还没有在那里。
简单的CLI命令在您的文本文件上命令,您想以智能的方式搜索,并在没有麻烦的软件的情况下应用全文搜索技术。
作为CLI命令,可将其导入的抽象旨在重复使用您的需求。您可以将它们插入服务。
单文件实用程序。仅Python和Sqlite。
Python$ ./configure --enable-loadable-sqlite-extensions -Python符合的SQLite二进制文件,并启用了加载fts5扩展名。代码中无需进一步的设置。fts5stemmer.so编译的共享库,用于用您的优选语言索引目标文件。 $ python librarian.py -h
usage: librarian.py [-h] [--db DB] [--table TABLE] [--debug] [--sql-trace] {index,match,update} ...
optional arguments:
-h, --help show this help message and exit
--db DB DB file path. (default: librarian.db)
--table TABLE Table name to store files content. (default: documents)
--debug Flag of print additional events. (default: False)
--sql-trace Flag of print sqlite statements. (default: False)
available commands:
Use -h with each of them to get help.
{index,match,update}
index Command to build a db and index. Have to be run once.
match Command to run query on indexed files.
update Command to check if content is changed and update in the database.$ python librarian.py index -h
usage: librarian.py index [-h] [--file-extensions string] [--language LANGUAGE] target
positional arguments:
target Directory or a file to build an index on.
optional arguments:
-h, --help show this help message and exit
--file-extensions string
List of file extensions separated by space which to scan only. (default: frozenset({ ' .md ' }))
--language LANGUAGE list of available languages https://snowballstem.org/algorithms/ (default: russian)
$ python librarian.py match -h
usage: librarian.py match [-h] [--limit LIMIT] [--fields field,...] [--format {raw,csv}] [--snippet 1, ' ' , ' ' , ' ' , ' ' ,10] query
positional arguments:
query Sqlite query term executed by " MATCH " statement. Syntax can be found on https://sqlite.org/fts5.html#full_text_query_syntax.
optional arguments:
-h, --help show this help message and exit
--limit LIMIT Max count of results. (default: 5)
--fields field,... List of document fields to retrieve separated by comma, order is preserved. Choices: ( ' path ' , ' extension ' , ' size ' , ' created ' , ' modified ' , ' hash ' , ' rank ' , ' snippet ' , ' rowid ' ). (default:
( ' path ' , ' snippet ' ))
--format {raw,csv} Choose a results output format. (default: csv)
--snippet 1, ' ' , ' ' , ' ' , ' ' ,10
Snippet properties settings https://sqlite.org/fts5.html#the_snippet_function (default: None)$ python librarian.py update -h
usage: librarian.py update [-h] [--clean]
optional arguments:
-h, --help show this help message and exit
--clean Delete missing file records (default: False)让我们建立一个DB。
$ python librarian.py --debug --sql-trace index ~ /Text-files-documents
SELECT fts5(NULL)
CREATE REAL TABLE IF NOT EXISTS documents
USING FTS5(path, content, extension, sizee, created, modified, tokenize= ' snowball russian ' ) ;
Excluded: /home/user/Text-files-documents/Today
Excluded: /home/user/Text-files-documents/Yesterday
Indexed: /home/user/Text-files-documents/Project/README.md
...并尝试一下。
$ python librarian.py match --fields created,path тесты
[(1620583081.3579128,
' /home/user/Text-files-documents/Project/README.md ' ,
' Для запуска тестов нужно поднимать докер-сервисыn nn### Описание запуска ' ),
...