Compendium Keeper is a tool that indexes Compendium data (generated by Compendium Scribe) into a vector database (like Pinecone) to power Retrieval-Augmented Generation (RAG) workflows.
.compendium.pickle and .compendium.xml file formats.git clone https://github.com/yourusername/compendiumkeeper.git
cd compendiumkeeperEnsure you have PDM installed. Then run:
pdm installCreate a .env file in the root directory of the project to store your API keys and configuration. You can use the provided .env.example as a template.
.env File# .env.example
# OpenAI API Key for generating embeddings
OPENAI_API_KEY=sk-your-openai-api-key
# Pinecone API Key and Environment
PINECONE_API_KEY=your-pinecone-api-key
PINECONE_ENVIRONMENT=us-east-1-awsRename .env.example to .env and replace the placeholder values with your actual API keys.
compendium-scribe-create-compendium --domain "Cell Biology"This produces files like cell_biology_2024-12-05.compendium.pickle and cell_biology_2024-12-05.compendium.xml.
Use the --compendium-file option to specify the Compendium file (pickle or XML).
You must also specify the vector database index name using the --index-name option.
Ensure your .env file is properly configured with the necessary API keys.
pdm run compendium-keeper index --compendium-file cell_biology_2024-12-05.compendium.pickle --index-name my_knowledge_indexpdm run compendium-keeper index --compendium-file cell_biology_2024-12-05.compendium.xml --index-name my_knowledge_indexAfter successful execution, you should see a confirmation message indicating the number of concepts indexed.
Indexed 25 concepts from domain 'Cell Biology' into index 'my_knowledge_index'.
Indexing complete!
To create a single knowledge base that spans multiple Compendia, repeat the indexing process for each Compendium, using the same --index-name.
For example:
pdm run compendium-keeper index --compendium-file django_2024-12-10.compendium.pickle --index-name all_python_knowledge
pdm run compendium-keeper index --compendium-file flask_2024-12-10.compendium.xml --index-name all_python_knowledgeThis will merge the knowledge from multiple Compendia into the same vector database index.
vector_db/ directory.utils.py to customize how embeddings are generated or processed.Set Up Environment Variables
Create a .env file as described above.
Generate a Compendium
Use Compendium Scribe to generate a Compendium in pickle or XML format.
Index with Compendium Keeper
Run the indexing command to upload embeddings to your chosen vector database.
Missing API Keys
Ensure that your .env file contains all required API keys. The CLI will notify you if any are missing.
Unsupported Vector DB
Currently, only Pinecone is supported. To add support for another vector database, implement a new class in vector_db/ adhering to the VectorDatabase abstract base class.
File Format Issues
Ensure that the --compendium-file you provide ends in either .compendium.pickle or .compendium.xml. Files with other extensions are not supported.
API Rate Limits
Be mindful of OpenAI's API rate limits when indexing large Compendia. Consider implementing batching or rate limiting if necessary.
Contributions are welcome! Feel free to open an issue or submit a pull request.
Compendium Keeper is released under the MIT License.