DawnSearch is an open source distributed web search engine that searches by meaning. It uses semantic search (searching on meaning), using all-MiniLM-L6-v2. It uses USearch for vector search. It can index the Common Crawl data. DawnSearch is written in Rust.
A public instance is available at dawnsearch.org.
DawnSearch currently functions as a distributed (semantic) vector search. When you start an instance, it will register with the tracker. The instance can then participate in the network by searching. Optionally, it can index the common crawl dataset and answer queries.
Main items still to do:
DawnSearch is looking for:
Please open issues for any questions or feedback. If you want to contribute something big, like a feature or a refactor, open an issue before you start so you don't do duplicate work!
This will build and run an 'access terminal' DawnSearch instance on a recent Ubuntu, without GPU acceleration. See Modes for examples of other configurations.
sudo apt-get update && sudo apt-get install -y build-essential pkg-config
# Install rust if you don't have it already:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
mv DawnSearch.toml.example DawnSearch.toml
RUSTFLAGS='-C target-cpu=native' cargo run --release
Now, go to http://localhost:8080 to access your own DawnSearch instance. You will be able to perform searches, but you will not contribute to the network yet. Take a look at Modes to see how you can do so.
If you want to upgrade to GPU acceleration try this. You need to have CUDA installed:
RUSTFLAGS='-C target-cpu=native' cargo run --release --features cuda
Note that on an M1/M2 Mac, 'cargo install' does NOT work. 'cargo build' does though!
Feel free to open an issue if you encounter problems!
You can configure DawnSearch through DawnSearch.toml or through environment variables like DAWNSEARCH_INDEX_CC.
Work in progress!