libllm下載 - libllm源代碼下載

libllm

Ai源碼

v20240928

下載

libLLM: Efficient inference of large language models.

Linux Windows macOS

Welcome to libLLM, an open-source project designed for efficient inference of large language models (LLM) on ordinary personal computers and mobile devices. The core is implemented in C++14, without any third-party dependencies (such as BLAS or SentencePiece), enabling seamless operation across a variety of devices.

歡迎使用libLLM，這是一個專為在普通個人電腦和移動設備上高效推理大型語言模型（LLM）而設計的開源項目。核心使用C++14編寫，沒有第三方依賴（BLAS、SentencePiece等），能在各種設備中無縫運行。

Model download:

Model	Download	llm Command
Index-1.9B-Character (Role-playing)	[?HF] [MS]	llm chat -m index:character
Index-1.9B-Chat	[?HF] [MS]	llm chat -m index
Qwen2-1.5B-Instruct	[?HF] [MS]	llm chat -m qwen:1.5b
Qwen2-7B-Instruct	[?HF] [MS]	llm chat -m qwen:7b
Llama3.2-1B-Instruct	[?HF] [MS]	llm chat -m llama3.2:1b
Llama3.2-3B-Instruct	[?HF] [MS]	llm chat -m llama3.2
Whisper-large-v3	[?HF] [MS]	llm transcribe -m whisper

HF = HuggingFace, MS = ModelScope

Kernel support matrix

OS	Platform	CUDA	avx2	avx512	asimdhp
Linux	x64	✅	✅	✅
Windows	x64	✅	✅	✅
macOS	arm64				✅

Recent updates

[2024-09-28] Support Llama3.2 models.
[2024-08-12] Support Whisper models.
[2024-08-02] Support the translation command in llm.
[2024-07-30] Support model downloading from huggingface. For example, llm chat -model index-character will automatically download the index-character model from ?Huggingface.

Quickstart

To run and chat with Bilibili-Index-1.9B-Character:

$ llm chat -m index-character

It will automatically download the Bilibili-Index-1.9B-Character from Huggingface or ModelScope (in China), and start the chat CLI in llm.

開始

與Bilibili-Index-1.9B-Character模型聊天：

$ llm chat -m index-character

llm會自動從Huggingface或者ModelScope（如果是中國IP）下載模型Bilibili-Index-1.9B-Character , 並且開始與它對話。

llm command line

$ src/libllm/llm chat -m index-character
INFO 2024-07-30T12:02:28Z interface.cc:67] ISA support: AVX2=1 F16C=1 AVX512F=1
INFO 2024-07-30T12:02:28Z interface.cc:71] Use Avx512 backend.
INFO 2024-07-30T12:02:30Z matmul.cc:43] Use GEMM from cuBLAS.
INFO 2024-07-30T12:02:30Z cuda_operators.cc:51] cuda numDevices = 2
INFO 2024-07-30T12:02:30Z cuda_operators.cc:52] cuda:0 maxThreadsPerMultiProcessor = 2048
INFO 2024-07-30T12:02:30Z cuda_operators.cc:54] cuda:0 multiProcessorCount = 20
INFO 2024-07-30T12:02:30Z thread_pool.cc:73] ThreadPool started. numThreads=20
INFO 2024-07-30T12:02:30Z llm.cc:204] read model package: /home/xiaoych/.libllm/models/bilibili-index-1.9b-character-q4.llmpkg
INFO 2024-07-30T12:02:30Z model_for_generation.cc:43] model_type = index
INFO 2024-07-30T12:02:30Z model_for_generation.cc:44] device = cuda
INFO 2024-07-30T12:02:31Z state_map.cc:66] 220 tensors read.
Please input your question.
    Type ' :new ' to start a new session (clean history).
    Type ' :sys <system_prompt> ' to set the system prompt and start a new session .
> hi
您好！我是Index，请问有什么我可以帮助您的吗？
(12 tokens, time=0.76s, 63.47ms per token)
>

Build

libLLM CPU only

$ mkdir build && cd build
$ cmake ..
$ make -j

For macOS

Please brew install OpenMP before cmake. NOTE: currently libllm macOS expected to be very slow since there is no aarch64 kernel for it.

% brew install libomp
% export OpenMP_ROOT= $( brew --prefix ) /opt/libomp
% mkdir build && cd build
% cmake ..
% make -j

Build with CUDA

NOTE: specify -DCUDAToolkit_ROOT=<CUDA-DIR> if there is multiple CUDA versions in your OS.

Recommand versions are:

CUDA: 11.7

$ mkdir build && cd build
$ cmake -DWITH_CUDA=ON [-DCUDAToolkit_ROOT =< CUDA-DIR > ] ..
$ make -j

API Examples

Python

 from libllm import Model , ControlToken

model = Model ( "tools/bilibili_index.llmpkg" )
prompt = [ ControlToken ( "<|reserved_0|>" ), "hi" , ControlToken ( "<|reserved_1|>" )]

for chunk in model . complete ( prompt ):
    print ( chunk . text , end = "" , flush = True )

print ( " n Done!" )

Go

 package main

import (
    "fmt"
    "log"

    "github.com/ling0322/libllm/go/llm"
)

func main () {
    model , err := llm . NewModel ( "../../tools/bilibili_index.llmpkg" , llm . Auto )
    if err != nil {
        log . Fatal ( err )
    }

    prompt := llm . NewPrompt ()
    prompt . AppendControlToken ( "<|reserved_0|>" )
    prompt . AppendText ( "hi" )
    prompt . AppendControlToken ( "<|reserved_1|>" )
    comp , err := model . Complete ( llm . NewCompletionConfig (), prompt )
    if err != nil {
        log . Fatal ( err )
    }

    for comp . IsActive () {
        chunk , err := comp . GenerateNextChunk ()
        if err != nil {
            log . Fatal ( err )
        }

        fmt . Print ( chunk . Text )
    }
    fmt . Println ()
}

Export Huggingface models

Here is an example of exporting Index-1.9B model from huggingface.

$ cd tools
$ python bilibili_index_exporter.py 
    -huggingface_name IndexTeam/Index-1.9B-Character 
    -quant q4  
    -output index.llmpkg

Then all required modules realted to IndexTeam/Index-1.9B-Character , including model, tokenizer and configs will be written to index.llmpkg .

展開

附加信息

版本 v20240928
類型 Ai源碼
更新時間 2025-09-10
大小 826.23KB
來自於 Github

相關應用

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部