booster
1.0.0

根據Merriam-Webster詞典的說法, Booster :
對於需要在生產環境中擴展GPT或僅僅自行嘗試模型的人來說,大型模型助推器的目標是成為一個簡單而強大的LLM推理加速器。
在Llama的第一個月內,Go Development我對原始的GGML.CPP項目的看法感到震驚,這對才華橫溢的人沒有限制來帶來令人振奮的功能並轉向AI的未來。
因此,我決定啟動一個新項目,其中一流的C ++ / CUDA Core將嵌入強大的Golang服務器中,以便在真實生產環境中大規模地進行強大的和表現推斷。
Booster(並且仍在)與Apple Silicon M1處理器一起在Mac上開發,因此,這確實很容易:
make mac遵循步驟1和步驟2,然後進行!
Ubuntu步驟1:安裝C ++和Golang編譯器,以及一些開發人員庫
sudo apt update -y && sudo apt upgrade -y &&
apt install -y git git-lfs make build-essential &&
wget https://golang.org/dl/go1.21.5.linux-amd64.tar.gz &&
tar -xf go1.21.5.linux-amd64.tar.gz -C /usr/local &&
rm go1.21.5.linux-amd64.tar.gz &&
echo 'export PATH="${PATH}:/usr/local/go/bin"' >> ~/.bashrc && source ~/.bashrc
Ubuntu步驟2:使用NVCC安裝NVIDIA驅動程序和CUDA工具包12.2
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin &&
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub &&
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" &&
sudo apt update -y &&
sudo apt install -y cuda-toolkit-12-2
現在您準備好搖滾了!
make cuda您越過下面的步驟:
make clean && make macwget https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.ggufid: mac
host: localhost
port: 8080
log: booster.log
deadline: 180
pods:
gpu:
model: hermes
prompt: chat
sampling: janus
threads: 1
gpus: [ 100 ]
batch: 512
models:
hermes:
name: Hermes2 Pro 8B
path: ~ /models/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
context: 8K
predict: 1K
prompts:
chat:
locale: en_US
prompt: " Today is {DATE}. You are virtual assistant. Please answer the question. "
system: " <|im_start|>systemn{PROMPT}<|im_end|> "
user: " n<|im_start|>usern{USER}<|im_end|> "
assistant: " n<|im_start|>assistantn{ASSISTANT}<|im_end|> "
samplings:
janus:
janus: 1
depth: 200
scale: 0.97
hi: 0.99
lo: 0.96以交互式模式啟動助推器,僅與模型聊天:
./booster啟動助推器作為服務器處理所有API端點並顯示調試信息:
./booster --server --debughttp://localhost:8080/jobs {
" id " : " 5fb8ebd0-e0c9-4759-8f7d-35590f6c9fc6 " ,
" prompt " : " Who are you? "
}http://localhost:8080/jobs/5fb8ebd0-e0c9-4759-8f7d-35590f6c9fc6 {
{
" id " : " 5fb8ebd0-e0c9-4759-8f7d-35590f6c9f77 " ,
" output " : " I'm a virtual assistant. " ,
" prompt " : " Who are you? " ,
" status " : " finished "
}
}booster.service文件中的說明,以了解如何從此API服務器中創建Daemond服務。