qdurllm

qdurllm

その他のソースコード

v0.0.0?

ダウンロード

qdurllm

お気に入りのウェブサイトを検索して、デスクトップでチャットしてください

qdurllmのフローチャート

qdurllm ( Qd rant URL s and L arge L anguage M odels) is a local search engine that lets you select and upload URL content to a vector database: after that, you can search, retrieve and chat with this content.

これは、QDRANT、LANGCHAIN、LLAMA.CPP、Quantized Gemma、Gradioを活用して、マルチコンテナーDockerアプリケーションを通じてプロビジョニングされます。

デモ！

Huggingfaceのデモスペースに向かいますか？

要件

The only requirement is to have docker and docker-compose .

持っていない場合は、ここに必ずインストールしてください。

インストール

GitHubリポジトリをクローニングしてアプリケーションをインストールできます

git clone https://github.com/AstraBert/qdurllm.git
cd qdurllm

Or you can simply paste the following text into a compose.yaml file:

 networks :
  mynet :
    driver : bridge
services :
  local-search-application :
    image : astrabert/local-search-application
    networks :
      - mynet
    ports :
      - " 7860:7860 "
  qdrant :
    image : qdrant/qdrant
    ports :
      - " 6333:6333 "
    volumes :
      - " ./qdrant_storage:/qdrant/storage "
    networks :
      - mynet
  llama_server :
    image : astrabert/llama.cpp-gemma
    ports :
      - " 8000:8000 "
    networks :
      - mynet

ファイルシステムに必要なディレクトリにファイルを配置します。

アプリケーションを実行する前に、オプションで必要なすべての画像をDocker Hubから引き出すことができます。

docker pull qdrant/qdrant
docker pull astrabert/llama.cpp-gemma
docker pull astrabert/local-search-application

どのように機能しますか？

起動すると（使用法を参照）、アプリケーションは3つのコンテナを実行します。

qdrant (port 6333): serves as vector database provider for semantic search-based retrieval
llama.cpp-gemma (port 8000): this is an implementation of a quantized Gemma model provided by LMStudio and Google, served with llama.cpp server.これは、テキストジェネレーションスコープで機能し、ユーザーの検索エクスペリエンスを強化します。
local-search-application (port 7860): a Gradio tabbed interface with:
- URLを指定して1つまたは複数のコンテンツをアップロードする可能性（Langchainのおかげ）
- The possibility to chat with the uploaded URLs thanks to llama.cpp-gemma
- The possibility to perform a direct search that leverages double-layered retrieval with all-MiniLM-L6-v2 (that identifies the 10 best matches) and sentence-t5-base (that re-encodes the 10 best matches and extracts the best hit from them) - this is the same RAG implementation used in combination with llama.cpp-gemma .単一層のぼろと比較して、二重層のぼろきれがどのように機能するかを見たいですか？ここに向かいましょう！

全体的な計算負荷は、アプリケーションをGPULESSだけでなく、RAMの可用性が低い（> = 8GB）、Gemmaが8GB RAMで応答するのに最大10分かかることがある）を実行するのに十分な軽さです。

使用法

それを実行します

You can make the application work with the following - really simple - command, which has to be run within the same directory where you stored your compose.yaml file:

docker compose up -d

If you've already pulled all the images, you'll find the application running at http://localhost:7860 or http://0.0.0.0:7860 in less than a minute.

画像を取得していない場合は、実際にアプリケーションを使用する前に、インストールが完了するのを待つ必要があります。