chromadb_framework下載chromadb_framework源代碼下載

chromadb_framework

其他源碼

1.0.0

下載

色度框架

概述

Chroma Framework是一個基於Python的應用程序，旨在使用句子變壓器模型來管理和搜索文本嵌入。該框架使用戶可以根據輸入查詢創建文本嵌入，添加新文檔並查詢最接近的文本的集合。

特徵

⛩️嵌入管理⛩️->創建和管理文本嵌入的集合。

文檔addtion- >使用元數據添加新文檔。

？文字搜索？ - >使用嵌入模型找到最接近給定查詢的文本。

動態路徑處理- >自動確定相對於項目目錄的文件路徑。

安裝

克隆存儲庫：

git clone https://github.com/yourusername/chromadb_framework

導航到項目目錄：
```
 cd chromadb_framework
```
安裝任何必需的依賴項（如果適用） 。
```
pip install -r requirements.txt
```

用法

確保安裝了Python 3.x。
通過執行運行應用程序：
```
python main.py
```
請按照屏幕上的提示管理嵌入和搜索文本。

項目結構

? project-root
├── ? config
│ ├── ? __ init __ .py
│ └── ? constants.py
│
├── ? src
│ ├── ? __ init __ .py
│ ├── ? client.py
│ ├── ? collection.py
│ └── ? data.py
│
├── ? utils
│ ├── ? __ init __ .py
│ └── ? helpers.py
│
├── ? .gitignore
├── ? .gitattributes
└── ? main.py

config.py/ ：包含配置文件。
- _ init _.py ：導入常數用於模型和收集配置。
- 常量：定義整個應用程序中使用的常數。
SRC/ ：包含源代碼文件。
- _ init _.py ：初始化源包並設置日誌記錄。
- 客戶端：創建數據庫客戶端的功能。
- collection.py ：管理收集和搜索文本的啟動。
- data.py ：功能可從指定的文件夾檢索數據。
utils/ ：包含實用程序功能。
- _ init _.py ：進口輔助功能。
- helpers.py ：設置模型並獲取路徑的實用程序功能。
.gitignore ：指定git忽略的文件和目錄（例如，虛擬環境，構建工件）。
。
main.py ：應用程序的入口點。初始化設置，嵌入操作並管理文本搜索。

代碼示例

主要程序

 from config . constants import MODEL_NAME , COLLECTION_NAME , INPUT_QUERY
from src . client import get_client
from src . collection import get_or_create_collection , add_collection , find_closest_texts
from src . data import get_data
from utils . helpers import set_def_llm , get_path

def main ():
    model_name = MODEL_NAME
    collection_name = COLLECTION_NAME
    input_query = INPUT_QUERY
    my_client = get_client ()
    my_folder_path = get_path ()
    embedding_function = set_def_llm ( model_name )
    my_collection = get_or_create_collection ( my_client , collection_name , embedding_function = embedding_function )
    my_documents , my_metadatas , my_ids = get_data ( my_folder_path )
    add_collection ( my_collection , my_documents , my_metadatas , my_ids )
    my_closest_texts = find_closest_texts ( my_collection , input_query )
    print ( "Closest text(s):" , my_closest_texts )

if __name__ == "__main__" :
    main ()

實用程序功能

helpers.py ：設置模型並獲取路徑的實用程序功能。

 from os . path import abspath , dirname , join
from chromadb . utils import embedding_functions

def set_def_llm ( model_name = None ):
    try :
        if model_name :
            return embedding_functions . SentenceTransformerEmbeddingFunction ( model_name = model_name )
        else :
            return embedding_functions . DefaultEmbeddingFunction ()
    except Exception as e :
        print ( f"An error occurred while setting the sentence transformer. n " )
        return None

def get_path ( folder_name = "texts" ):
    try :
        current_path = dirname ( abspath ( __file__ ))
        project_path = dirname ( current_path )
        full_path = join ( project_path , folder_name )
        return full_path
    except Exception as e :
        print ( f"An error occurred while getting the folder path. n " )

客戶創建

客戶端：創建數據庫客戶端的功能。

 from chromadb import PersistentClient

def get_client ( path = "vector_db" ):
    try :
        client = PersistentClient ( path = path )
        return client
    except FileNotFoundError :
        print ( f"Database directory not found:" )
    except Exception as e :
        print ( f"An error occurred while creating the client: { e } " )

收集管理

collection.py ：管理收集和搜索文本的功能。

 def get_or_create_collection ( client , name , embedding_function ):
    try :
        return client . get_or_create_collection ( name = name , embedding_function = embedding_function )
    except Exception as e :
        print ( f"An error occurred while creating the collection: { e } " )

def add_collection ( collection , documents , metadatas , ids ):
    try :   
        collection . add (
            documents = documents , 
            metadatas = metadatas ,
            ids = ids
            )
    except Exception as e :
        print ( f"An error occurred while adding to the collection: { e } " )

def find_closest_texts ( collection , input_query , n_results = 2 ):
    try :
        closest_text_names = list ()
        results = collection . query (
            query_texts = [ input_query ],
            include = [ "metadatas" ],
            n_results = n_results
        )
        for item in results [ "metadatas" ][ 0 ]:
            closest_text_names . append ( item [ "source" ])
        return closest_text_names
    except Exception as e :
        print ( f"An error occurred while finding the closest text: { e } " )

數據準備

data.py ：功能可從指定的文件夾檢索數據。

 from os import listdir
from os . path import join

def get_data ( folder_path ):
    try :
        documents = list ()
        metadatas = list ()
        ids = list ()
        id_count = 1

        for file_name in listdir ( folder_path ):
            if file_name . endswith ( ".txt" ):
                file_path = join ( folder_path , file_name )
                id = "id" + str ( id_count )
                with open ( file_path ) as file :
                    content = file . read ()
                    documents . append ( content )
                    metadatas . append ({ "source" : file_name })
                    ids . append ( id )
                id_count += 1
        return documents , metadatas , ids
    except Exception as e :
        print ( f"An error occurred while creating the data: { e } " )
        return [], [], []