Blazing fast semantic search for discord channels
ask-discord enables users to semantically search through a dataset of Discord messages. There are two main search modes:
Clone the repository:
git clone https://github.com/yourusername/ask-discord.git
cd ask-discordInstall dependencies:
pip install -r requirements.txtSet up environment variables:
Create a .env file in the root directory and add your OpenAI API key:
OPENAI_KEY=your_openai_api_keyStart Milvus: Follow the Milvus installation guide to set up and start Milvus. (Requires a recent Docker installation)
Generate the data: Download your channels of interest using Discord Chat Exporter. Read this guide if you have trouble getting your Token and Channel IDs. This is not an endorsement as downloading channels may violate Discord TOS.
Load the data:
Ensure the JSON data file is in the correct path specified in configs. Modify the path in the main file if needed.
Run the Streamlit application:
streamlit run ask-discord.pyAccess the application:
Open your web browser and go to http://localhost:8501.
Chatbot class which handles querying Milvus and interacting with Raw/LLM mode.Configurations are managed through a dictionary in ask-discord.py. These include:
OPENAI_CLIENT: OpenAI client instance.CHAT_MODEL: The model to use for chat (e.g., gpt-4o).EMBEDDING_MODEL: The model to use for generating embeddings.JSON_DATA_PATH: Path to the JSON data file.EMBEDDING_DIMENSIONS: vector dimensions.MAX_MESSAGE_LENGTH: Maximum number of characters in a message to be considered.MIN_MESSAGE_LENGTH: Minimum number of characters in a message to be considered.COLLECTION_NAME: Name of the Milvus collection.MAX_SIMILAR_EXAMPLES: Maximum number of similar messages to retrieve.SIMILARITY_SCORE_CUTOFF: Cutoff for similarity score.Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.
This project is licensed under the MIT License. See the LICENSE file for more details.