markdown file query
1.0.0
This project currently works best with English documents.
this project
.md file, so it works perfectly with Notion & Obsidian (though for Notion you have to export it to .md manually first)langchain.textsplitterlangchain.embeddings.OpenAIEmbeddings)Pinecone vector database.--help optionexport PINECONE_API_KEY="your_pinecone_api_key"
export OPENAI_API_KEY="your_openai_api_key"import os
os.environ["PINECONE_API_KEY"]
os.environ["OPENAI_API_KEY"]KeyError, then restart the terminal upon completion (and your IDE if you are using one).git clone https://github.com/madeyexz/markdown-file-query.git pip install pinecone langchain tqdmFOLDER (or any name you like, but you have to change the code accordingly). Notice this should be in the same directory as main.py.main.py program
python3 main.py "PATH_OF_FOLDER" "QUESTION"answer.txt and contents.txt respectively.query_only.py to avoid re-embedding the documents.
python3 query_only.py "QUESTION"markdown_database which contains a bunch of .md files, I want to query this database with the question "Whats the strange situation"
❯ python3 main.py "markdown_database" "what's the strange situation" initiating pinecone index...
digesting docs...
uploading datas to pinecone...
92%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 60/65 [00:29<00:02, 1.87it/s]
let's wait for 60 seconds to avoid RateLimitError... (since im not a paid user))
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [01:00<00:00, 1.00s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65/65 [01:32<00:00, 1.42s/it]
querying pinecone...
querying gpt...
writing results to answer.txt and contents.txt
done! the answer to 'what's the strange situation' is: '
The Strange Situation is a standardized procedure devised by Mary Ainsworth in the 1970s to observe attachment security in children within the context of caregiver relationships. It applies to infants between the age of nine and 18 months and involves a series of eight episodes lasting approximately 3 minutes each, whereby a mother, child and stranger are introduced, separated and reunited. The procedure is used to observe the quality of a young child’s attachment to his or her mother, and can also be applied to other attachment figures, such as God, through the use of Emotionally Focused Therapy (EFT) and religious beliefs, such as the saying “there are no atheists in foxholes”.'
query_only.py to avoid re-embedding the documents.
❯ python3 query_only.py "Who is Mary Ainsworth?"connecting to pinecone index...
getting docs
querying pinecone...
querying gpt...
done! the answer to 'Who is Mary Ainsworth?' is: '
Mary Ainsworth was a developmental psychologist who devised the Strange Situation in the 1970s to observe attachment security in children within the context of caregiver relationships. The Strange Situation involves a series of eight episodes lasting approximately 3 minutes each, whereby a mother, child and stranger are introduced, separated and reunited. Ainsworth is also known for her observation that if you want to see the quality of a young child’s attachment to his or her mother, watch what the child does, not when Mother leaves, but when she returns. She is also known for her research on anxious babies and their inability to use their mothers as a secure base.'
If you use Pinecone, then whenever you want to query a new document (i.e. creating a new database), you should probably create a new Pinecone index (for you don't want answers from the old document), or delete the old index. This is because Pinecone does not support updating the index (yet).
To delete the old index:
python3 delete_pinecone_index.py NAME_OF_INDEXHuge shout out to the open-source community for providing straight-forward examples and comprehensive tutorials!