booster
1.0.0

根据Merriam-Webster词典的说法, Booster :
对于需要在生产环境中扩展GPT或仅仅自行尝试模型的人来说,大型模型助推器的目标是成为一个简单而强大的LLM推理加速器。
在Llama的第一个月内,Go Development我对原始的GGML.CPP项目的看法感到震惊,这对才华横溢的人没有限制来带来令人振奋的功能并转向AI的未来。
因此,我决定启动一个新项目,其中一流的C ++ / CUDA Core将嵌入强大的Golang服务器中,以便在真实生产环境中大规模地进行强大的和表现推断。
Booster(并且仍在)与Apple Silicon M1处理器一起在Mac上开发,因此,这确实很容易:
make mac遵循步骤1和步骤2,然后进行!
Ubuntu步骤1:安装C ++和Golang编译器,以及一些开发人员库
sudo apt update -y && sudo apt upgrade -y &&
apt install -y git git-lfs make build-essential &&
wget https://golang.org/dl/go1.21.5.linux-amd64.tar.gz &&
tar -xf go1.21.5.linux-amd64.tar.gz -C /usr/local &&
rm go1.21.5.linux-amd64.tar.gz &&
echo 'export PATH="${PATH}:/usr/local/go/bin"' >> ~/.bashrc && source ~/.bashrc
Ubuntu步骤2:使用NVCC安装NVIDIA驱动程序和CUDA工具包12.2
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin &&
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub &&
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" &&
sudo apt update -y &&
sudo apt install -y cuda-toolkit-12-2
现在您准备好摇滚了!
make cuda您越过下面的步骤:
make clean && make macwget https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.ggufid: mac
host: localhost
port: 8080
log: booster.log
deadline: 180
pods:
gpu:
model: hermes
prompt: chat
sampling: janus
threads: 1
gpus: [ 100 ]
batch: 512
models:
hermes:
name: Hermes2 Pro 8B
path: ~ /models/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
context: 8K
predict: 1K
prompts:
chat:
locale: en_US
prompt: " Today is {DATE}. You are virtual assistant. Please answer the question. "
system: " <|im_start|>systemn{PROMPT}<|im_end|> "
user: " n<|im_start|>usern{USER}<|im_end|> "
assistant: " n<|im_start|>assistantn{ASSISTANT}<|im_end|> "
samplings:
janus:
janus: 1
depth: 200
scale: 0.97
hi: 0.99
lo: 0.96以交互式模式启动助推器,仅与模型聊天:
./booster启动助推器作为服务器处理所有API端点并显示调试信息:
./booster --server --debughttp://localhost:8080/jobs {
" id " : " 5fb8ebd0-e0c9-4759-8f7d-35590f6c9fc6 " ,
" prompt " : " Who are you? "
}http://localhost:8080/jobs/5fb8ebd0-e0c9-4759-8f7d-35590f6c9fc6 {
{
" id " : " 5fb8ebd0-e0c9-4759-8f7d-35590f6c9f77 " ,
" output " : " I'm a virtual assistant. " ,
" prompt " : " Who are you? " ,
" status " : " finished "
}
}booster.service文件中的说明,以了解如何从此API服务器中创建Daemond服务。