A collection of scripts to streamline the translation of markdown files using vector stores and deep learning.
This toolkit provides a set of Python scripts designed to simplify the translation process for markdown files. The scripts leverage embedding models to enhance the accuracy of document retrieval and improve the overall translation workflow.
The search_word.py script initializes a search engine for retrieving relevant documents based on embeddings. It is designed to work with markdown files in multiple languages. The CLI app is based on Typer.
Installation:
pip install fatushfatush rungit clone https://github.com/alperiox/fatush.gitcd fatushpip install -r requirements.txt or poetry installConfiguration:
python fatush/search_word.py runconfig.yaml file is not found, the script will fetch documents from the FastAPI repo and create the necessary configuration file.Processing Documents:
Loading Embedding Model:
Vector Store:
Search Engine Initialization:
TODOs:
search_word.pysource_lang: Source language code (e.g., 'en').translation_lang: Translation language code (e.g., 'tr').docs_path: Path to the documents (default is the current working directory).vectorstore_path: Path to the vector store (default is the current working directory).Since the project is built on my experience with translating FastAPI documentation, a nicer abstraction is a must for a more generally usable toolset. That is because there are several hard-coded variables at the moment, like fetching the documentation from the FastAPI repository.