Vector_db_with_llm
Fastapi es un marco web moderno, rápido (de alto rendimiento) para construir API con Python 3.8+ basado en Python estándar. Este es un repositorio que proporciona entregar los registros a la aplicación Prometheus-Export.
Langchain (https://github.com/langchain-ai/langchain) es un marco para el desarrollo de aplicaciones alimentadas por modelos de idiomas grandes (LLM). Langchain es un marco de código abierto para aplicaciones de construcción basadas en modelos de idiomas grandes (LLM). Los LLM son grandes modelos de aprendizaje profundo previamente capacitados en grandes cantidades de datos que pueden generar respuestas a las consultas de los usuarios. Por ejemplo, responder preguntas o crear imágenes a partir de indicaciones basadas en texto. Las indicaciones son consultas que la gente usa para buscar respuestas de un LLM.
vectordB (es decir, Milvus, FAISS (búsqueda de similitud de Facebook AI), Chroma, Qdrant, Pinecone): una data de vectores (VECTORDB) está diseñada para almacenar y administrar datos de vectores, a menudo se usa en el aprendizaje automático y AI Aplications. Los datos vectoriales se refieren a representaciones numéricas de objetos, que pueden usarse para la búsqueda de similitud, la agrupación y otras tareas. (Referencia: https://krishna-yogik.medium.com/vectordb-tutorial-a-beginners-guide-06dc3333fac2f)
RAG (generación auxiliar de recuperación) es una técnica de IA que permite a las empresas incrustar automáticamente sus datos patentados más actuales y relevantes directamente en su mensaje LLM.
Apache Flink: Apache Kafka y Apache Flink son dos herramientas potentes en el procesamiento de Big Data y Stream. Si bien Kafka es conocida por su robusto sistema de mensajería, Flink es bueno en el procesamiento y el análisis de flujo en tiempo real.
streamlit es un marco de python abierto para los científicos de datos e ingenieros ai/ml para entregar aplicaciones de datos interactivas
pip install streamlit pip install streamlit-chat streamlit run [streamlit-filenam.py] [--server.port 30001] streamlit run app.pypip install streamlit pip install streamlit-chat streamlit run [streamlit-filenam.py] [--server.port 30001] streamlit run app.pyblack==23.3.0 mypy==1.4.1 pre-commit==3.3.3 watchdog pytestGradio (https://www.gradio.app/guides/quickstart): Gradio es un paquete Python de código abierto que le permite construir rápidamente una demostración o aplicación web para su modelo de aprendizaje automático, API o cualquier función arbitraria de Python. Luego puede compartir un enlace a su demostración o aplicación web en solo unos segundos utilizando las funciones de intercambio incorporadas de Gradio.
./gradio-start.shpip install --upgrade gradio )Elasticsearch Gen-Ai: https://github.com/elastic/elasticsearch-labs/blob/Main/notebooks/generative-ai/chatbot.ipynb
Codespacessaces (https://github.com/features/codespaces, https://velog.io/@profile_exe/github-codespaces): Github Codespaces lo hace subir y codificar más rápido con entornos de desarrollo de nubes seguros y completamente configurados nativos de Github.
Pyautogui permite que sus scripts de Python controlen el mouse y el teclado para automatizar las interacciones con otras aplicaciones. La API está diseñada para ser simple. Pyautogui funciona en Windows, MacOS y Linux, y se ejecuta en Python 2 y 3. Para instalar con PIP, ejecute pip install pyautogui . Consulte la página de instalación para obtener más detalles (https://pyautogui.readthedocs.io/en/latest/)
import pyautogui import time while True: print(pyautogui.position ()) pyautogui.moveTo(100,200) pyautogui.click(100, 200) # pyautogui.moveTo(200,200, duration=0.5) time.sleep(10)./jupyter-notebook.sh ): http://localhost:8889/tree/Langchain/workflow/jupyter-workflow sudo yum install gcc openssl-devel bzip2-devel libffi-devel zlib-devel git wget https://www.python.org/ftp/python/3.9.0/Python-3.9.0.tgz tar –zxvf Python-3.9.0.tgz or tar -xvf Python-3.9.0.tgz cd Python-3.9.0 ./configure --libdir=/usr/lib64 sudo make sudo make altinstall # python3 -m venv .venv --without-pip sudo yum install python3-pip sudo ln -s /usr/lib64/python3.9/lib-dynload/ /usr/local/lib/python3.9/lib-dynload # -- From Python ^3.10 , It need to be installed openssl # openssl cd /usr/local/src wget https://www.openssl.org/source/openssl-1.1.1t.tar.gz tar xvf openssl-1.1.1t.tar.gz cd openssl-1.1.1t/ ./config --prefix=/usr/local/ssl --openssldir=/usr/local/ssl shared zlib make sudo make install export LDFLAGS= " -L/usr/local/ssl/lib " export CPPFLAGS= " -I/usr/local/ssl/include " # openssl확인 /usr/local/ssl/bin/openssl version export LD_LIBRARY_PATH=/usr/local/ssl/lib: $LD_LIBRARY_PATH echo $LD_LIBRARY_PATH sudo yum install gcc openssl-devel bzip2-devel libffi-devel zlib-devel git wget https://www.python.org/ftp/python/3.11.0/Python-3.11.0.tgz tar –zxvf Python-3.11.0.tgz or tar -xvf Python-3.11.0.tgz cd Python-3.11.0 # --with-openssl-rpath=auto 옵션을 추가하여 파이썬이 자동으로 올바른 OpenSSL 라이브러리 경로를 찾도록 함 # ./configure --libdir=/usr/lib64 --with-openssl=/usr/local/ssl --with-openssl-rpath=auto ./configure --libdir=/usr/lib64 --with-openssl=/usr/bin/ssl --with-openssl-rpath=auto sudo make sudo make altinstall # -- Error occurs when installing packages via pip like below (.venv) -bash-4.2$ pip install elasticsearch==7.13 WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by ' SSLError("Can ' t connect to HTTPS URL because the SSL module is not available. " )': /simple/elasticsearch/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError( " Can ' t connect to HTTPS URL because the SSL module is not available.") ' : /simple/elasticsearch/ ERROR: Operation cancelled by user # -- python -m venv .venv source .venv/bin/activate # -- Swagger pip install poetry poetry add fastapi poetry add uvicorn poetry add gunicorn poetry add pytz poetry add httpx poetry add pytest poetry add pytest-cov poetry add requests poetry add pyyaml poetry add elasticsearch==7.13 poetry add python-dotenv poetry add jupyter # -- # -- Vector poetry config virtualenvs.in-project true pip install poetry poetry init poetry add openai langchain langchainhub tiktoken chromadb langchain-community bs4 python-dotenv poetry add sentence-transformers poetry add pypdf poetry add docx2txt poetry add faiss-cpu poetry add requests pip install --q openai langchain langchainhub tiktoken chromadb langchain-community bs4 # when error occur like this # ImportError: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.0.2k-fips 26 Jan 2017'. See: https://github.com/urllib3/urllib3/issues/2168 pip install urllib3==1.26.18 pip install pytz pip install requests==2.27.1./service-start.shpip install git+https://github.com/jm/git_pip_install.git o git+https://oss.navercorp.com/nsml/nsml_notebook.git@branch_name (https://liferecorde.tistory.com/49, https://newsight.tistory.com/296)./Langchain/workflow/ )curl -X ' POST ' ' http://localhost:7001/vector/uploadfile ' -H ' accept: application/json ' -H ' Content-Type: multipart/form-data ' -F ' [email protected];type=application/msword ' { " filename " : [ { " index " : { " _index " : " test_context " , " _type " : " search " } }, { " ES_UPLOADED " : " JSON_FORMAT " , " CONTENT " : " This is a test Word document for the TemplatePackage example site. " } ] }python ./Langchain/workflow/text_loader.py*** type : < class ' list ' > / len : 1 data : [Document(metadata={'source': 'C:\Users\euiyoung.hwang\Git_Workspace\Vector_DB_with_LLM/Data/Sample.hwp'}, page_content='KTX 노선도ﺎĀ Ā※ KTX 소요 시간과 운임은 철도청의 사정에 따라 변동할 수 있습니다.Ā서울용산광명Ā천안아산대전서대전동대구익산논산김제밀양정읍구포장성광주부산광주송정나주목포ĀྠĀ 경부선 ྠĀ 호남선ĀĀKTX 소요시간호남선 (서울~천안아산 경부선과 동일)서울서대전논산익산김제정읍장성광주광주송정나주목포시:분6:226:517:15-7:38--8:06-8:38경부선서울용산광명천안아산대전동대구밀양구포부산시:분5:305:456:246:307:187:498:148:27KTX 운임안내최저운임8,100원 (월~목 요금제 기준)호남선 (단위:원)행신8,1008,10015,10024,50026,40028,80031,20032,80035,00037,10039,00040,40044,80040,000용산8,10013,30022,70024,70027,20029,60031,20033,40036,30037,50038,90043,30038,400광명11,30020,70022,60025,10028,60029,40031,60034,50036,40037,10041,50036,700천안아산9,40011,40014,00017,70019,10021,70024,70026,60027,70032,20027,300서대전8,1008,1008,30010,00012,70015,90018,10019,70024,30019,300계룡8,1008,1008,10010,70014,00016,10017,70022,50017,400논산8,1008,1008,10011,40013,50015,10020,20014,800익산8,1008,1008,1009,80011,40016,50011,100김제8,1008,1008,1009,70014,8009,300정읍8,1008,1008,10012,1008,100장성8,1008,1008,9008,100광주송정8,1008,100-나주8,100-목포-광주경부선 (단위:원)행신8,1008,10014,10022,80039,70044,20047,80049,100서울8,10012,70021,40038,40043,00046,60047,900광명10,50019,20036,50041,00044,60046,000천안아산8,70025,60029,40032,90034,200대전16,90022,10025,30026,700동대구8,1009,30010,800밀양8,1008,100구포8,100부산')] page_content : KTX 노선도ﺎĀ Ā※ KTX 소요 시간과 운임은 철도청의 사정에 따라 변동할 수 있습니다.Ā서울용산광명Ā천안아산대전서대전동대구익산논산김제밀양정읍구포장성광주부산광주송정나주목포ĀྠĀ 경부선 ྠĀ 호남선ĀĀKTX 소요시간호남선 (서울~천안아산 경부선과 동일)서울서대전논산익산김제정읍장성광주광주송정나주목포시:분6:226:517:15-7:38--8:06-8:38경부선서울용산광명천안아산대전동대구밀양구포부산시:분5:305:456:246:307:187:498:148:27KTX 운임안내최저운임8,100원 (월~목 요금제 기준)호남선 (단위:원)행신8,1008,10015,10024,50026,40028,80031,20032,80035,00037,10039,00040,40044,80040,000용산8,10013,30022,70024,70027,20029,60031,20033,40036,30037,50038,90043,30038,400광명11,30020,70022,60025,10028,60029,40031,60034,50036,40037,10041,50036,700천안아산9,40011,40014,00017,70019,10021,70024,70026,60027,70032,20027,300서대전8,1008,1008,30010,00012,70015,90018,10019,70024,30019,300계룡8,1008,1008,10010,70014,00016,10017,70022,50017,400논산8,1008,1008,10011,40013,50015,10020,20014,800익산8,1008,1008,1009,80011,40016,50011,100김제8,1008,1008,1009,70014,8009,300정읍8,1008,1008,10012,1008,100장성8,1008,1008,9008,100광주송정8,1008,100-나주8,100-목포-광주경부선 (단위:원)행신8,1008,10014,10022,80039,70044,20047,80049,100서울8,10012,70021,40038,40043,00046,60047,900광명10,50019,20036,50041,00044,60046,000천안아산8,70025,60029,40032,90034,200대전16,90022,10025,30026,700동대구8,1009,30010,800밀양8,1008,100구포8,100부산 *** [ { " index " : { " _index " : " test_context " , " _type " : " search " } }, { " ES_UPLOADED " : " JSON_FORMAT " , "CONTENT": "KTX 노선도ﺎĀ Ā※ KTX 소요 시간과 운임은 철도청의 사정에 따라 변동할 수 있습니다.Ā서울용산광명Ā천안아산대전서대전동대구익산논산김제밀양정읍구포장성광주부산광주송정나주목포ĀྠĀ 경부선 ྠĀ 호남선ĀĀKTX 소요시간호남선 (서울~천안아산 경부선과 동일)서울서대전논산익산김제정읍장성광주광주송정나주목포시:분6:226:517:15-7:38--8:06-8:38경부선서울용산광명천안아산대전동대구밀양구포부산시:분5:305:456:246:307:187:498:148:27KTX 운임안내최저운임8,100원 (월~목 요금제 기준)호남선 (단위:원)행신8,1008,10015,10024,50026,40028,80031,20032,80035,00037,10039,00040,40044,80040,000용산8,10013,30022,70024,70027,20029,60031,20033,40036,30037,50038,90043,30038,400광명11,30020,70022,60025,10028,60029,40031,60034,50036,40037,10041,50036,700천안아산9,40011,40014,00017,70019,10021,70024,70026,60027,70032,20027,300서대전8,1008,1008,30010,00012,70015,90018,10019,70024,30019,300계룡8,1008,1008,10010,70014,00016,10017,70022,50017,400논산8,1008,1008,10011,40013,50015,10020,20014,800익산8,1008,1008,1009,80011,40016,50011,100김제8,1008,1008,1009,70014,8009,300정읍8,1008,1008,10012,1008,100장성8,1008,1008,9008,100광주송정8,1008,100-나주8,100-목포-광주경부선 (단위:원)행신8,1008,10014,10022,80039,70044,20047,80049,100서울8,10012,70021,40038,40043,00046,60047,900광명10,50019,20036,50041,00044,60046,000천안아산8,70025,60029,40032,90034,200대전16,90022,10025,30026,700동대구8,1009,30010,800밀양8,1008,100구포8,100부산" } ]http://localhost:7001/docs
source .venv/bin/activatepoetry run py.test -v --junitxml=test-reports/junit/pytest.xml --cov-report html --cov tests/ o ./pytest.sh$ ./pytest.sh tests t est_api.py::test_skip SKIPPED (no way of currently testing this) [ 50%] tests t est_api.py::test_api PASSED [100%] ---------- coverage: platform win32, python 3.11.7-final-0 ----------- Name Stmts Miss Cover Missing ---------------------------------------------------------------- config c onfig.py 8 4 50% 16-20, 33 config l og_config.py 32 1 97% 42 controller _ _init__.py 0 0 100% controller c luster_controller.py 14 0 100% injector.py 25 0 100% main.py 23 9 61% 38-57, 69 service _ _init__.py 0 0 100% service e s_search_handler.py 139 103 26% 32-88, 131-132, 139-155, 160-173, 178-191, 196-213, 219-254, 259-271, 276-287, 292-304, 309-315 service e s_util.py 21 16 24% 5-11, 16-19, 24-35, 40-41 service q uery_builder.py 42 27 36% 21-40, 48-51, 64-83, 89-98, 102-137 service s tatus_handler.py 13 2 85% 12, 19 tests _ _init__.py 0 0 100% tests c onftest.py 8 0 100% tests t est_api.py 9 1 89% 7 ---------------------------------------------------------------- TOTAL 334 163 51% $$ ./pytest.sh tests t est_api.py::test_skip SKIPPED (no way of currently testing this) [ 50%] tests t est_api.py::test_api PASSED [100%] ---------- coverage: platform win32, python 3.11.7-final-0 ----------- Name Stmts Miss Cover Missing ---------------------------------------------------------------- config c onfig.py 8 4 50% 16-20, 33 config l og_config.py 32 1 97% 42 controller _ _init__.py 0 0 100% controller c luster_controller.py 14 0 100% injector.py 25 0 100% main.py 23 9 61% 38-57, 69 service _ _init__.py 0 0 100% service e s_search_handler.py 139 103 26% 32-88, 131-132, 139-155, 160-173, 178-191, 196-213, 219-254, 259-271, 276-287, 292-304, 309-315 service e s_util.py 21 16 24% 5-11, 16-19, 24-35, 40-41 service q uery_builder.py 42 27 36% 21-40, 48-51, 64-83, 89-98, 102-137 service s tatus_handler.py 13 2 85% 12, 19 tests _ _init__.py 0 0 100% tests c onftest.py 8 0 100% tests t est_api.py 9 1 89% 7 ---------------------------------------------------------------- TOTAL 334 163 51% $./circleci/config.yml ): Circleci es una plataforma de integración continua y entrega continua que ayuda a los equipos de software a funcionar más inteligente, más rápido. Con Circleci, cada compromiso inicia un nuevo trabajo en nuestra plataforma, y el código se construye, prueba e implementa../.github/workflows/build-and-test.yml ): las acciones de GitHub es una plataforma continua de integración y entrega continua (CI/CD) que le permite automatizar su canalización de compilación, prueba e implementación. Puede crear flujos de trabajo que creen y prueben cada solicitud de extracción a su repositorio, o implementen solicitudes de extracción de extracción de fusiones para la producción.# -- /etc/systemd/system/vector_interface_api.service [Unit] Description=Swagger ES Service [Service] User=devuser Group=devuser Type=simple ExecStart=/bin/bash /home/devuser/Git_Repo/service-start.sh ExecStop= /usr/bin/killall vector_interface_api [Install] WantedBy=default.target # Service command sudo systemctl daemon-reload sudo systemctl start vector_interface_api.service sudo systemctl status vector_interface_api.service sudo systemctl stop vector_interface_api.service sudo service vector_interface_api status/stop/start