modelz llm
23.07.4
ModelZ LLM是一款推理服务器,可在具有OpenAI兼容API的本地或基于云的环境上利用开源大语模型(LLM),例如FastChat,Llama和ChatGLM。
pip install modelz-llm
# or install from source
pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]请首先按照说明启动自托管API服务器:
modelz-llm -m bigscience/bloomz-560m --device cpu目前,我们支持以下模型:
| 模型名称 | 拥抱面模型 | Docker图像 | 推荐GPU |
|---|---|---|---|
| Fastchat T5 | lmsys/fastchat-t5-3b-v1.0 | ModelZai/LLM-FastChat-T5-3B | NVIDIA L4(24GB) |
| Vicuna 7B Delta v1.1 | lmsys/vicuna-7b-delta-v1.1 | Modelzai/LLM-Vicuna-7b | NVIDIA A100(40GB) |
| 美洲驼7b | decapoda-research/llama-7b-hf | ModelZai/LLM-LLAMA-7B | NVIDIA A100(40GB) |
| chatglm 6b int4 | THUDM/chatglm-6b-int4 | modelzai/llm-chatglm-6b-int4 | NVIDIA T4(16GB) |
| chatglm 6b | THUDM/chatglm-6b | modelzai/llm-chatglm-6b | NVIDIA L4(24GB) |
| Bloomz 560m | bigscience/bloomz-560m | ModelZai/LLM-Bloomz-560m | 中央处理器 |
| Bloomz 1.7b | bigscience/bloomz-1b7 | 中央处理器 | |
| Bloomz 3B | bigscience/bloomz-3b | NVIDIA L4(24GB) | |
| Bloomz 7.1b | bigscience/bloomz-7b1 | NVIDIA A100(40GB) |
然后,您可以使用OpenAi Python SDK与模型进行交互:
import openai
openai . api_base = "http://localhost:8000"
openai . api_key = "any"
# create a chat completion
chat_completion = openai . ChatCompletion . create ( model = "any" , messages = [{ "role" : "user" , "content" : "Hello world" }])您也可以将Modelz-Llm与Langchain集成:
import openai
openai . api_base = "http://localhost:8000"
openai . api_key = "any"
from langchain . llms import OpenAI
llm = OpenAI ()
llm . generate ( prompts = [ "Could you please recommend some movies?" ])您也可以直接在ModelZ上部署ModelZ-LLM:
ModelZ LLM支持以下API,以与开源大语言模型进行交互:
/completions/chat/completions/embeddings/engines/<any>/embeddings/v1/completions/v1/chat/completions/v1/embeddings