modelz llm
23.07.4
ModelZ LLM是一款推理服務器,可在具有OpenAI兼容API的本地或基於雲的環境上利用開源大語模型(LLM),例如FastChat,Llama和ChatGLM。
pip install modelz-llm
# or install from source
pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]請首先按照說明啟動自託管API服務器:
modelz-llm -m bigscience/bloomz-560m --device cpu目前,我們支持以下模型:
| 模型名稱 | 擁抱面模型 | Docker圖像 | 推薦GPU |
|---|---|---|---|
| Fastchat T5 | lmsys/fastchat-t5-3b-v1.0 | ModelZai/LLM-FastChat-T5-3B | NVIDIA L4(24GB) |
| Vicuna 7B Delta v1.1 | lmsys/vicuna-7b-delta-v1.1 | Modelzai/LLM-Vicuna-7b | NVIDIA A100(40GB) |
| 美洲駝7b | decapoda-research/llama-7b-hf | ModelZai/LLM-LLAMA-7B | NVIDIA A100(40GB) |
| chatglm 6b int4 | THUDM/chatglm-6b-int4 | modelzai/llm-chatglm-6b-int4 | NVIDIA T4(16GB) |
| chatglm 6b | THUDM/chatglm-6b | modelzai/llm-chatglm-6b | NVIDIA L4(24GB) |
| Bloomz 560m | bigscience/bloomz-560m | ModelZai/LLM-Bloomz-560m | 中央處理器 |
| Bloomz 1.7b | bigscience/bloomz-1b7 | 中央處理器 | |
| Bloomz 3B | bigscience/bloomz-3b | NVIDIA L4(24GB) | |
| Bloomz 7.1b | bigscience/bloomz-7b1 | NVIDIA A100(40GB) |
然後,您可以使用OpenAi Python SDK與模型進行交互:
import openai
openai . api_base = "http://localhost:8000"
openai . api_key = "any"
# create a chat completion
chat_completion = openai . ChatCompletion . create ( model = "any" , messages = [{ "role" : "user" , "content" : "Hello world" }])您也可以將Modelz-Llm與Langchain集成:
import openai
openai . api_base = "http://localhost:8000"
openai . api_key = "any"
from langchain . llms import OpenAI
llm = OpenAI ()
llm . generate ( prompts = [ "Could you please recommend some movies?" ])您也可以直接在ModelZ上部署ModelZ-LLM:
ModelZ LLM支持以下API,以與開源大語言模型進行交互:
/completions/chat/completions/embeddings/engines/<any>/embeddings/v1/completions/v1/chat/completions/v1/embeddings