IndexerMcIndexFace Download - IndexerMcIndexFace Source code download

English

中文(简体) 中文(繁体) 한국어 日本語 English Português Español Русский العربية Indonesia Deutsch Français ภาษาไทย

Home>Programming related>Other source code

IndexerMcIndexFace

Other source code

1.0.0

Download

A (toy) low-level document indexing and retrieval system

IndexerMcIndexFace is a tiny traditional document indexing and retrieval system that I wrote as an excuse to play with FSTs (using the BurntSushi/fst crate) and Rust's parallelization capabilities (using also the crossbeam crate for message passing)

Features:

Fully written in Rust
Uses FSTs for fast access to postings
Allows fielded documents, and uses the BM25F retrieval model (note: I didn't verify its correctness)
The indexing stage is paralellized with a threadpool by creating and merging independent indexes
- (Note that this is a naive implementation, and although it's extremely fast it can be really memory hungry)
The retrieval stage is parallelized with a threadpool, where in this case it runs a different search for every token

Warnings:

This is a toy project (e.g: index files are not compressed, the parallelization techniques are naive and resource-hungry...) and the API is very basic.

Usage:

Simply run cargo run --release. main.rs will create a dummy collection of 1000 files using the MitchellRhysHall/random_word crate, and then will index and perform a randomised moderately sized query.

Possible improvements:

The use of FSTs opens up many possibilities, as regex-like searches can be easily performed.
Better parallelization techniques: Right now, each thread will create its own in-memory index, which will be later joined and written to binary files. This means that the memory usage can be very high for bigger collections of documents.
Better tokenizers.
N-gram or similar, more elaborate, indexes.
Alternative retrieval models, phrase queries, etc.

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-05-27
size 27.78KB
From Github

Related Applications

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All