modelz llm
23.07.4
Modelz LLM is an inference server that facilitates the utilization of open source large language models (LLMs), such as FastChat, LLaMA, and ChatGLM, on either local or cloud-based environments with OpenAI compatible API.
pip install modelz-llm
# or install from source
pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]Please first start the self-hosted API server by following the instructions:
modelz-llm -m bigscience/bloomz-560m --device cpuCurrently, we support the following models:
| Model Name | Huggingface Model | Docker Image | Recommended GPU |
|---|---|---|---|
| FastChat T5 | lmsys/fastchat-t5-3b-v1.0 |
modelzai/llm-fastchat-t5-3b | Nvidia L4(24GB) |
| Vicuna 7B Delta V1.1 | lmsys/vicuna-7b-delta-v1.1 |
modelzai/llm-vicuna-7b | Nvidia A100(40GB) |
| LLaMA 7B | decapoda-research/llama-7b-hf |
modelzai/llm-llama-7b | Nvidia A100(40GB) |
| ChatGLM 6B INT4 | THUDM/chatglm-6b-int4 |
modelzai/llm-chatglm-6b-int4 | Nvidia T4(16GB) |
| ChatGLM 6B | THUDM/chatglm-6b |
modelzai/llm-chatglm-6b | Nvidia L4(24GB) |
| Bloomz 560M | bigscience/bloomz-560m |
modelzai/llm-bloomz-560m | CPU |
| Bloomz 1.7B | bigscience/bloomz-1b7 |
CPU | |
| Bloomz 3B | bigscience/bloomz-3b |
Nvidia L4(24GB) | |
| Bloomz 7.1B | bigscience/bloomz-7b1 |
Nvidia A100(40GB) |
Then you can use the OpenAI python SDK to interact with the model:
import openai
openai.api_base="http://localhost:8000"
openai.api_key="any"
# create a chat completion
chat_completion = openai.ChatCompletion.create(model="any", messages=[{"role": "user", "content": "Hello world"}])You could also integrate modelz-llm with langchain:
import openai
openai.api_base="http://localhost:8000"
openai.api_key="any"
from langchain.llms import OpenAI
llm = OpenAI()
llm.generate(prompts=["Could you please recommend some movies?"])You could also deploy the modelz-llm directly on Modelz:
Modelz LLM supports the following APIs for interacting with open source large language models:
/completions/chat/completions/embeddings/engines/<any>/embeddings/v1/completions/v1/chat/completions/v1/embeddings