Vector_db_with_llm
FastAPI est un cadre Web moderne, rapide (haute performance) et de création d'API avec Python 3.8+ basé sur Python standard. Il s'agit d'un référentiel qui fournit pour livrer les enregistrements à l'application Prometheus-Export.
Langchain (https://github.com/langchain-ai/langchain) est un cadre pour développer des applications alimentées par des modèles de grande langue (LLM). Langchain est un cadre open source pour la création d'applications basée sur des modèles de grande langue (LLMS). Les LLM sont de grands modèles d'apprentissage en profondeur pré-formés sur de grandes quantités de données qui peuvent générer des réponses aux requêtes utilisateur. Par exemple, répondre à des questions ou créer des images à partir d'invites textuelles. Les invites sont des questions que les gens utilisent pour chercher des réponses auprès d'un LLM.
vectordb (c.-à-d. Milvus, Faiss (Facebook AI Simility Search), ChroMA, QDRANT, PINECONE): une base de données Vector (VectordB) est conçue pour le magasin et la gestion des données vectorielles, souvent utilisées dans l'apprentissage automatique et les applications AI. Les données vectorielles se réfèrent aux représentations numériques des objets, qui peuvent être utilisés pour la recherche de similitude, le clustering et d'autres tâches. (Référence: https://krishna-yogik.medium.com/vectordb-tutorial-a-beginners-guide-06dc33fac2f)
RAG (Génération à la récupération augmentée) est une technique d'IA qui permet aux entreprises d'intégrer automatiquement leurs données propriétaires les plus récentes et les plus pertinentes directement dans leur invite LLM.
Apache Flink: Apache Kafka et Apache Flink sont deux outils puissants dans les mégadonnées et le traitement des flux. Bien que Kafka soit connue pour son système de messagerie robuste, Flink est bon dans le traitement et l'analyse des flux en temps réel.
Streamlit is an open-source Python framework for data scientists and AI/ML engineers to deliver interactive data apps
pip install streamlit pip install streamlit-chat streamlit run [streamlit-filenam.py] [--server.port 30001] streamlit run app.pyblack==23.3.0 mypy==1.4.1 pre-commit==3.3.3 watchdog pytestGradio (https://www.gradio.app/guides/quickstart): Gradio est un package Python ouvert qui vous permet de créer rapidement une application de démonstration ou Web pour votre modèle d'apprentissage automatique, API ou toute fonction Python arbitraire. Vous pouvez ensuite partager un lien vers votre démo ou votre application Web en quelques secondes en utilisant les fonctionnalités de partage intégrées de Gradio.
./gradio-start.shpip install --upgrade gradio )Elasticsearch Gen-AI: https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/generative-ai/chatbot.ipynb
codpaces (https://github.com/features/codepaces, https://velog.io/@profile_exe/github-codepaces): les codes github vous obligent et codaient plus rapidement avec des environnements de développement de nuages entièrement configurés et sécurisés natifs à GitHub.
Pyautogui permet à vos scripts Python de contrôler la souris et le clavier pour automatiser les interactions avec d'autres applications. L'API est conçue pour être simple. Pyautogui fonctionne sur Windows, MacOS et Linux, et s'exécute sur Python 2 et 3. Pour installer avec PIP, exécutez pip install pyautogui . Voir la page d'installation pour plus de détails (https://pyautogui.readthedocs.io/en/latest/)
import pyautogui import time while True: print(pyautogui.position ()) pyautogui.moveTo(100,200) pyautogui.click(100, 200) # pyautogui.moveTo(200,200, duration=0.5) time.sleep(10)./jupyter-notebook.sh ): http: // localhost: 8889 / arbre / langchain / workflow / jupyter-workflow sudo yum install gcc openssl-devel bzip2-devel libffi-devel zlib-devel git wget https://www.python.org/ftp/python/3.9.0/Python-3.9.0.tgz tar –zxvf Python-3.9.0.tgz or tar -xvf Python-3.9.0.tgz cd Python-3.9.0 ./configure --libdir=/usr/lib64 sudo make sudo make altinstall # python3 -m venv .venv --without-pip sudo yum install python3-pip sudo ln -s /usr/lib64/python3.9/lib-dynload/ /usr/local/lib/python3.9/lib-dynload # -- From Python ^3.10 , It need to be installed openssl # openssl cd /usr/local/src wget https://www.openssl.org/source/openssl-1.1.1t.tar.gz tar xvf openssl-1.1.1t.tar.gz cd openssl-1.1.1t/ ./config --prefix=/usr/local/ssl --openssldir=/usr/local/ssl shared zlib make sudo make install export LDFLAGS= " -L/usr/local/ssl/lib " export CPPFLAGS= " -I/usr/local/ssl/include " # openssl확인 /usr/local/ssl/bin/openssl version export LD_LIBRARY_PATH=/usr/local/ssl/lib: $LD_LIBRARY_PATH echo $LD_LIBRARY_PATH sudo yum install gcc openssl-devel bzip2-devel libffi-devel zlib-devel git wget https://www.python.org/ftp/python/3.11.0/Python-3.11.0.tgz tar –zxvf Python-3.11.0.tgz or tar -xvf Python-3.11.0.tgz cd Python-3.11.0 # --with-openssl-rpath=auto 옵션을 추가하여 파이썬이 자동으로 올바른 OpenSSL 라이브러리 경로를 찾도록 함 # ./configure --libdir=/usr/lib64 --with-openssl=/usr/local/ssl --with-openssl-rpath=auto ./configure --libdir=/usr/lib64 --with-openssl=/usr/bin/ssl --with-openssl-rpath=auto sudo make sudo make altinstall # -- Error occurs when installing packages via pip like below (.venv) -bash-4.2$ pip install elasticsearch==7.13 WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by ' SSLError("Can ' t connect to HTTPS URL because the SSL module is not available. " )': /simple/elasticsearch/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError( " Can ' t connect to HTTPS URL because the SSL module is not available.") ' : /simple/elasticsearch/ ERROR: Operation cancelled by user # -- python -m venv .venv source .venv/bin/activate # -- Swagger pip install poetry poetry add fastapi poetry add uvicorn poetry add gunicorn poetry add pytz poetry add httpx poetry add pytest poetry add pytest-cov poetry add requests poetry add pyyaml poetry add elasticsearch==7.13 poetry add python-dotenv poetry add jupyter # -- # -- Vector poetry config virtualenvs.in-project true pip install poetry poetry init poetry add openai langchain langchainhub tiktoken chromadb langchain-community bs4 python-dotenv poetry add sentence-transformers poetry add pypdf poetry add docx2txt poetry add faiss-cpu poetry add requests pip install --q openai langchain langchainhub tiktoken chromadb langchain-community bs4 # when error occur like this # ImportError: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.0.2k-fips 26 Jan 2017'. See: https://github.com/urllib3/urllib3/issues/2168 pip install urllib3==1.26.18 pip install pytz pip install requests==2.27.1./service-start.shpip install git+https://github.com/jm/git_pip_install.git ou git+https://oss.navercorp.com/nsml/nsml_notebook.git@branch_name (https://liferecorde.tistory.com/49, https://newsight.tistory.com/296)./Langchain/workflow/ )curl -X ' POST ' ' http://localhost:7001/vector/uploadfile ' -H ' accept: application/json ' -H ' Content-Type: multipart/form-data ' -F ' [email protected];type=application/msword ' { " filename " : [ { " index " : { " _index " : " test_context " , " _type " : " search " } }, { " ES_UPLOADED " : " JSON_FORMAT " , " CONTENT " : " This is a test Word document for the TemplatePackage example site. " } ] }python ./Langchain/workflow/text_loader.py*** type : < class ' list ' > / len : 1 data : [Document(metadata={'source': 'C:\Users\euiyoung.hwang\Git_Workspace\Vector_DB_with_LLM/Data/Sample.hwp'}, page_content='KTX 노선도ﺎĀ Ā※ KTX 소요 시간과 운임은 철도청의 사정에 따라 변동할 수 있습니다.Ā서울용산광명Ā천안아산대전서대전동대구익산논산김제밀양정읍구포장성광주부산광주송정나주목포ĀྠĀ 경부선 ྠĀ 호남선ĀĀKTX 소요시간호남선 (서울~천안아산 경부선과 동일)서울서대전논산익산김제정읍장성광주광주송정나주목포시:분6:226:517:15-7:38--8:06-8:38경부선서울용산광명천안아산대전동대구밀양구포부산시:분5:305:456:246:307:187:498:148:27KTX 운임안내최저운임8,100원 (월~목 요금제 기준)호남선 (단위:원)행신8,1008,10015,10024,50026,40028,80031,20032,80035,00037,10039,00040,40044,80040,000용산8,10013,30022,70024,70027,20029,60031,20033,40036,30037,50038,90043,30038,400광명11,30020,70022,60025,10028,60029,40031,60034,50036,40037,10041,50036,700천안아산9,40011,40014,00017,70019,10021,70024,70026,60027,70032,20027,300서대전8,1008,1008,30010,00012,70015,90018,10019,70024,30019,300계룡8,1008,1008,10010,70014,00016,10017,70022,50017,400논산8,1008,1008,10011,40013,50015,10020,20014,800익산8,1008,1008,1009,80011,40016,50011,100김제8,1008,1008,1009,70014,8009,300정읍8,1008,1008,10012,1008,100장성8,1008,1008,9008,100광주송정8,1008,100-나주8,100-목포-광주경부선 (단위:원)행신8,1008,10014,10022,80039,70044,20047,80049,100서울8,10012,70021,40038,40043,00046,60047,900광명10,50019,20036,50041,00044,60046,000천안아산8,70025,60029,40032,90034,200대전16,90022,10025,30026,700동대구8,1009,30010,800밀양8,1008,100구포8,100부산')] page_content : KTX 노선도ﺎĀ Ā※ KTX 소요 시간과 운임은 철도청의 사정에 따라 변동할 수 있습니다.Ā서울용산광명Ā천안아산대전서대전동대구익산논산김제밀양정읍구포장성광주부산광주송정나주목포ĀྠĀ 경부선 ྠĀ 호남선ĀĀKTX 소요시간호남선 (서울~천안아산 경부선과 동일)서울서대전논산익산김제정읍장성광주광주송정나주목포시:분6:226:517:15-7:38--8:06-8:38경부선서울용산광명천안아산대전동대구밀양구포부산시:분5:305:456:246:307:187:498:148:27KTX 운임안내최저운임8,100원 (월~목 요금제 기준)호남선 (단위:원)행신8,1008,10015,10024,50026,40028,80031,20032,80035,00037,10039,00040,40044,80040,000용산8,10013,30022,70024,70027,20029,60031,20033,40036,30037,50038,90043,30038,400광명11,30020,70022,60025,10028,60029,40031,60034,50036,40037,10041,50036,700천안아산9,40011,40014,00017,70019,10021,70024,70026,60027,70032,20027,300서대전8,1008,1008,30010,00012,70015,90018,10019,70024,30019,300계룡8,1008,1008,10010,70014,00016,10017,70022,50017,400논산8,1008,1008,10011,40013,50015,10020,20014,800익산8,1008,1008,1009,80011,40016,50011,100김제8,1008,1008,1009,70014,8009,300정읍8,1008,1008,10012,1008,100장성8,1008,1008,9008,100광주송정8,1008,100-나주8,100-목포-광주경부선 (단위:원)행신8,1008,10014,10022,80039,70044,20047,80049,100서울8,10012,70021,40038,40043,00046,60047,900광명10,50019,20036,50041,00044,60046,000천안아산8,70025,60029,40032,90034,200대전16,90022,10025,30026,700동대구8,1009,30010,800밀양8,1008,100구포8,100부산 *** [ { " index " : { " _index " : " test_context " , " _type " : " search " } }, { " ES_UPLOADED " : " JSON_FORMAT " , "CONTENT": "KTX 노선도ﺎĀ Ā※ KTX 소요 시간과 운임은 철도청의 사정에 따라 변동할 수 있습니다.Ā서울용산광명Ā천안아산대전서대전동대구익산논산김제밀양정읍구포장성광주부산광주송정나주목포ĀྠĀ 경부선 ྠĀ 호남선ĀĀKTX 소요시간호남선 (서울~천안아산 경부선과 동일)서울서대전논산익산김제정읍장성광주광주송정나주목포시:분6:226:517:15-7:38--8:06-8:38경부선서울용산광명천안아산대전동대구밀양구포부산시:분5:305:456:246:307:187:498:148:27KTX 운임안내최저운임8,100원 (월~목 요금제 기준)호남선 (단위:원)행신8,1008,10015,10024,50026,40028,80031,20032,80035,00037,10039,00040,40044,80040,000용산8,10013,30022,70024,70027,20029,60031,20033,40036,30037,50038,90043,30038,400광명11,30020,70022,60025,10028,60029,40031,60034,50036,40037,10041,50036,700천안아산9,40011,40014,00017,70019,10021,70024,70026,60027,70032,20027,300서대전8,1008,1008,30010,00012,70015,90018,10019,70024,30019,300계룡8,1008,1008,10010,70014,00016,10017,70022,50017,400논산8,1008,1008,10011,40013,50015,10020,20014,800익산8,1008,1008,1009,80011,40016,50011,100김제8,1008,1008,1009,70014,8009,300정읍8,1008,1008,10012,1008,100장성8,1008,1008,9008,100광주송정8,1008,100-나주8,100-목포-광주경부선 (단위:원)행신8,1008,10014,10022,80039,70044,20047,80049,100서울8,10012,70021,40038,40043,00046,60047,900광명10,50019,20036,50041,00044,60046,000천안아산8,70025,60029,40032,90034,200대전16,90022,10025,30026,700동대구8,1009,30010,800밀양8,1008,100구포8,100부산" } ]http://localhost:7001/docs
source .venv/bin/activatepoetry run py.test -v --junitxml=test-reports/junit/pytest.xml --cov-report html --cov tests/ ou ./pytest.sh$ ./pytest.sh tests t est_api.py::test_skip SKIPPED (no way of currently testing this) [ 50%] tests t est_api.py::test_api PASSED [100%] ---------- coverage: platform win32, python 3.11.7-final-0 ----------- Name Stmts Miss Cover Missing ---------------------------------------------------------------- config c onfig.py 8 4 50% 16-20, 33 config l og_config.py 32 1 97% 42 controller _ _init__.py 0 0 100% controller c luster_controller.py 14 0 100% injector.py 25 0 100% main.py 23 9 61% 38-57, 69 service _ _init__.py 0 0 100% service e s_search_handler.py 139 103 26% 32-88, 131-132, 139-155, 160-173, 178-191, 196-213, 219-254, 259-271, 276-287, 292-304, 309-315 service e s_util.py 21 16 24% 5-11, 16-19, 24-35, 40-41 service q uery_builder.py 42 27 36% 21-40, 48-51, 64-83, 89-98, 102-137 service s tatus_handler.py 13 2 85% 12, 19 tests _ _init__.py 0 0 100% tests c onftest.py 8 0 100% tests t est_api.py 9 1 89% 7 ---------------------------------------------------------------- TOTAL 334 163 51% $./circleci/config.yml./.github/workflows/build-and-test.yml ): GitHub Actions est une plate-forme d'intégration continue et de livraison continue (CI / CD) qui vous permet d'automatiser votre pipeline de construction, tester et de déploiement. Vous pouvez créer des workflows qui construisent et testent chaque demande de traction vers votre référentiel, ou déployer des demandes de traction fusionnées à la production.# -- /etc/systemd/system/vector_interface_api.service [Unit] Description=Swagger ES Service [Service] User=devuser Group=devuser Type=simple ExecStart=/bin/bash /home/devuser/Git_Repo/service-start.sh ExecStop= /usr/bin/killall vector_interface_api [Install] WantedBy=default.target # Service command sudo systemctl daemon-reload sudo systemctl start vector_interface_api.service sudo systemctl status vector_interface_api.service sudo systemctl stop vector_interface_api.service sudo service vector_interface_api status/stop/start