booster下載 - booster源代碼下載

booster

其他源碼

1.0.0

下載

根據Merriam-Webster詞典的說法， Booster ：

用於增加力，功率，壓力或有效性的輔助裝置
多階段火箭的第一階段為發射和飛行的初始部分提供了推力

對於需要在生產環境中擴展GPT或僅僅自行嘗試模型的人來說，大型模型助推器的目標是成為一個簡單而強大的LLM推理加速器。

超級大國

以性能和縮放為基礎，感謝Golang和C ++
Python依賴性不再有問題
如果需要的話，僅CPU的推斷：任何Intel或AMD X64，ARM64和Apple Silicon
GPU也得到了支持： NVIDIA CUDA，蘋果金屬，甚至OpenCl卡
在許多GPU之間拆分真正的大型型號（ Warp Llama 70B，帶2X RTX 3090 ）
僅在CPU機器上表現出色，快速地推斷了與GPU的怪物的怪物
支持常規的FP16/FP32型號及其量化版本 - 4位真正的岩石！
流行的LLM體系結構已經存在： Llama ，Mistral，Gemma等...
特殊獎金：Sota Janus採樣代碼生成和非英語語言

動機

在Llama的第一個月內，Go Development我對原始的GGML.CPP項目的看法感到震驚，這對才華橫溢的人沒有限制來帶來令人振奮的功能並轉向AI的未來。

因此，我決定啟動一個新項目，其中一流的C ++ / CUDA Core將嵌入強大的Golang服務器中，以便在真實生產環境中大規模地進行強大的和表現推斷。

V3路線圖 - 夏季24

再次品牌項目:) Collider =>助推器
完整的Llama V3和V3.1支持
OpenAI API聊天完成兼容端點
Ollama兼容端點
從命令行聊天的交互式模式
更新Janus對Llama-3的採樣
...最後V3發布！

V3+路線圖 - 秋季24

與Ollama生態系統更廣泛的整合
達到限制時，更智能的上下文擴展
沒有外部依賴性的嵌入式Web UI
本地窗戶二進制
所有平台的預構建二進製文件
支持LLAVA多模式推理
更好的代碼測試覆蓋範圍
困惑計算可用於基準測試

如何在Mac上構建？

Booster（並且仍在）與Apple Silicon M1處理器一起在Mac上開發，因此，這確實很容易：

make mac

如何在Ubuntu上編譯CUDA？

遵循步驟1和步驟2，然後進行！

Ubuntu步驟1：安裝C ++和Golang編譯器，以及一些開發人員庫

 sudo apt update -y && sudo apt upgrade -y && 
apt install -y git git-lfs make build-essential && 
wget https://golang.org/dl/go1.21.5.linux-amd64.tar.gz && 
tar -xf go1.21.5.linux-amd64.tar.gz -C /usr/local && 
rm go1.21.5.linux-amd64.tar.gz && 
echo 'export PATH="${PATH}:/usr/local/go/bin"' >> ~/.bashrc && source ~/.bashrc

Ubuntu步驟2：使用NVCC安裝NVIDIA驅動程序和CUDA工具包12.2

 wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && 
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && 
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub && 
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" && 
sudo apt update -y && 
sudo apt install -y cuda-toolkit-12-2

現在您準備好搖滾了！

make cuda

如何運行？

您越過下面的步驟：

從源構建服務器[MAC推斷為示例]

make clean && make mac

將模型下載，例如[Hermes 2 Pro]基於[Llama-V3-8B]量化為GGUF Q4公里格式的[Llama-V3-8B]：

wget https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf

創建配置文件並將其放在同一目錄中[請參閱config.sample.yaml]

id: mac
host: localhost
port: 8080
log: booster.log
deadline: 180

pods:

  gpu:
    model: hermes
    prompt: chat
    sampling: janus
    threads: 1
    gpus: [ 100 ]
    batch: 512

models:

  hermes:
    name: Hermes2 Pro 8B
    path: ~ /models/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
    context: 8K
    predict: 1K

prompts:

  chat:
    locale: en_US
    prompt: " Today is {DATE}. You are virtual assistant. Please answer the question. "
    system: " <|im_start|>systemn{PROMPT}<|im_end|> "
    user: " n<|im_start|>usern{USER}<|im_end|> "
    assistant: " n<|im_start|>assistantn{ASSISTANT}<|im_end|> "

samplings:

  janus:
    janus: 1
    depth: 200
    scale: 0.97
    hi: 0.99
    lo: 0.96

完成所有操作後，使用DEBUG啟動服務器可以確保其工作

以交互式模式啟動助推器，僅與模型聊天：

./booster

啟動助推器作為服務器處理所有API端點並顯示調試信息：

./booster --server --debug

現在，將Booster與Ollama/OpenAI API一起使用，或將JSON郵政為本機ASYNC API http://localhost:8080/jobs

{
    " id " : " 5fb8ebd0-e0c9-4759-8f7d-35590f6c9fc6 " ,
    " prompt " : " Who are you? "
}

請參閱本機http的結果訪問本地async api http://localhost:8080/jobs/5fb8ebd0-e0c9-4759-8f7d-35590f6c9fc6

{
{
    " id " : " 5fb8ebd0-e0c9-4759-8f7d-35590f6c9f77 " ,
    " output " : " I'm a virtual assistant. " ,
    " prompt " : " Who are you? " ,
    " status " : " finished "
}
}

請參閱booster.service文件中的說明，以了解如何從此API服務器中創建Daemond服務。

展開

附加信息

版本 1.0.0
類型其他源碼
更新時間 2025-03-05
大小 2.16MB
來自於 Github

相關應用

遊戲助推器

2023-05-30

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部

Pokemon TCG Pocket 何時推出交易和新補充包？
2024-11-17