MHTextSearch download MHTextSearch源代码下载

mhtextsearch

一个用Objective-C编写的快速和最小嵌入式全文索引库，该库以Objective-LevelDB之上构建。

安装

到目前为止，将该库集成在项目中的最简单方法是使用Cocoapods。

如果您还没有安装Cocoapods
在您的Podfile中，添加行
```
 pod 'MHTextSearch'
```
运行pod install
将libc++.dylib框架添加到您的项目中。

简单的API

创建一个嵌入式文本索引

MHTextIndex *index = [MHTextIndex textIndexInLibraryWithName: @" my.awesome.index " ];

索引任何对象

您可以告诉MHTextIndex实例索引您的对象（任何对象）

[ index indexObject: anyObjectYouWant];
[ index updateIndexForObject: anotherPreviousIndexedObject];
[ index removeIndexForObject: anotherPreviousIndexedObject];

但是，要为此，您需要告诉我们NSData *标识符可以用来唯一地参考此对象。

[ index setIdentifier: ^ NSData *(MyCustomObject *object){
    return object. indexID ; // a NSData instance
}];

您还需要向我们提供有关对象的详细信息，例如要索引的文本片段

[ index setIndexer: ^MHIndexedObject *(MyCustomObject *object, NSData *identifier){
    MHIndexedObject *indx = [MHIndexedObject new ];
    indx. strings = @[ object.title, object.description ]; // Indexed strings
    indx. weight = object. awesomenessLevel ;                // Weight given to this object, when sorting results
    indx. context = @{ @" title " : object. title };             // A NSDictionary that will be given alongside search results
    return indx;
}];

最后，如果您想在获得搜索结果时可以轻松参考原始对象，可以告诉我们如何为您做到这一点

[ index setObjectGetter: ^MyCustomObject *( NSData *identifier){
    return [MyCustomObject customObjectFromIdentifier: identifier];
}];

就是这样！这就是您要获得全文索引所需的全部。正如您所期望的那样，MhtextSearch将文本分为单词，考虑了所有内容和资本化，这些内容和资本化都将其分解为文字（嗯，基金会在这里完成大部分工作）。

然后，您可以开始搜索：

[ index enumerateObjectsForKeyword: @" duck " options: 0 withBlock: ^(MHSearchResultItem *item, 
                                                                NSUInteger rank, 
                                                                NSUInteger count, 
                                                                BOOL *stop){
                                                                    
    item. weight ;      // As provided by you earlier
    item. rank ;        // The effective rank in the search result
    item. object ;      // The first time it is used, it will use the block
                            // you provided earlier to get the object
    item. context ;     // The dictionary you provided in the "indexer" block
    item. identifier ;  // The object identifier you provided in the "identifier" block

    NSIndexPath *token = item. resultTokens [ 0 ]; 
    /* This is an NSArray of NSIndexPath instances, each containing 3 indices:
     *   - mh_string : the string in which the token occured 
     *                 (here, 0 for the object's title)
     *   - mh_word : the position in the string where the word containing
     *               the token occured
     *   - mh_token : the position in the word where the token occured
     */
                        
    NSRange tokenRange = [item rangeOfToken: token];
    /* This gives the exact range of the matched token in the string where it was found.
     *
     * So, according to the example setup I've been giving from the start,  
     * if token.mh_string == 0, that means the token was found in the object's "title",
     * and [item.object.title substringWithRange:tokenRange] would yield "duck" (minus 
     * capitalization and diacritics).
     */
}];

您也可以MHSearchResultItem

 NSArray *resultSet = [ index searchResultForKeyword: @" duck "
                                           options: NSEnumerationReverse];

子类

如果给出指定行为的块不是您的事，您也可以覆盖以下方法：

-[MHTextIndex getIdentifierForObject:]默认情况下，使用identifier块
-[MHTextIndex getIndexInfoForObject:andIdentifier:]默认情况下，使用indexer块
-[MHTextIndex compareResultItem:withItem:reversed:] ，用于订购搜索结果集

与核心数据一起使用

您可以使用NSManagedObject生命周期方法来触发文本索引的更改。以下示例是从http://www.adevelopingstory.com/blog/2013/04/adding-full-text-search-cearch-core-core-data.html出现的，并适合于此项目：

- ( void )prepareForDeletion
{
    [ super prepareForDeletion ];

    if (self. indexID . length ) {
        [textindex deleteIndexForObject: self .indexID];
    }
}

+ ( NSData *)createIndexID {
    NSUUID *uuid = [ NSUUID UUID ];
    uuid_t uuidBytes;
    [uuid getUUIDBytes: uuidBytes];
    return [ NSData dataWithBytes: uuidBytes length: 16 ];
}

- ( void )willSave
{
    [ super willSave ];

    if (self. indexID . length ) {
        [textindex updateIndexForObject: self .indexID];
    } else {
        self. indexID = [[ self class ] createIndexID ];
        [textindex indexObject: self .indexID];
    }
}

操作和队列

MHTextIndex使用引擎盖下的NSOperationQueue来协调索引操作。它被视为名为indexingQueue属性。因此，您可以将其maxConcurrentOperationCount属性设置为控制索引的同时。由于执行I/O的基础数据库库是线程安全，因此并发并不是问题。这也意味着您可以明确等待使用索引操作完成：

[ index .indexingQueue waitUntilAllOperationsAreFinished ];

三种索引方法-[MHTextIndex indexObject:] ， -[MHTextIndex updateIndexForObject:] ， -[MHTextIndex removeIndexForObject:]所有返回NSOperation实例，如果需要的话，可以利用其completionBlock属性或[nsoperation property或-[NSOperation waitUntilFinished] 。

搜索也是并发的，但它使用dispatch_queue_t （尚未暴露或可调）。

性能和微调

您可以玩一些旋钮，以使Mhtextseach更好地满足您的需求。

MHTextIndex实例具有skipStopWords boolean属性，默认情况下是真实的，并且避免索引非常常见的英语单词。（ Todo ：与其他语言一起工作）
它还具有默认情况下等于2的minimalTokenLength 。这为令牌需要索引需要的字母数量最少。这也极大地最小化了索引的大小以及索引和搜索时间。它跳过索引单字母单词和每个单词的最后一个字母，设置为2 。
当索引长的形式文本（文档而不是简单的名称）时，您可以在MHTextIndex上打开discardDuplicateTokens布尔属性布尔属性。这使得索引仅考虑给定文本的每个索引令牌的第一次事件。如果您只知道一个令牌是否出现在文本中，而不是每个事件出现的地方都可以，则可以在索引时间中获得速度颠簸，以3至5倍。

以下图显示了索引和搜索时间（以秒为单位），这是文本索引的大小的函数，范围从500 kb到约10 mb。基准测试是在iPhone 5上运行的。

测试

如果要运行测试，则需要Xcode 5，因为测试套件使用了新的XCtest。

克隆这个存储库，一旦进入

$ cd MHTextSearch  iOS  Tests
$ pod install
$ cd .. && open * .xcworkspace

目前，所有测试均设置为使用iOS测试套件。

执照

根据MIT许可分发

展开

MHTextSearch

mhtextsearch

安装

简单的API

创建一个嵌入式文本索引

索引任何对象

子类

与核心数据一起使用

操作和队列

性能和微调

测试

执照

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express