Unduh cappr - Unduh Kode Sumber cappr

cappr

Kode Sumber AI

v0.9.6 - fix Llama 3 tokenizer

Unduh

CAPPR: Penyelesaian setelah probabilitas cepat

Buat LLM Anda memilih dari daftar pilihan.
Atau menghitung probabilitas penyelesaian yang diberikan prompt, yang mungkin berguna.
Peras LLMS Open Source lebih banyak.

Penggunaan

Gunakan model GGUF

 from llama_cpp import Llama
from cappr . llama_cpp . classify import predict

model = Llama ( "./TinyLLama-v0.Q8_0.gguf" , verbose = False )

prompt = """Gary told Spongebob a story:
There once was a man from Peru; who dreamed he was eating his shoe. He
woke with a fright, in the middle of the night, to find that his dream
had come true.

The moral of the story is to"""

completions = (
  "look at the bright side" ,
  "use your imagination" ,
  "eat shoes" ,
)

pred = predict ( prompt , completions , model )
print ( pred )
# use your imagination

Lihat halaman dokumentasi ini untuk info lebih lanjut tentang penggunaan model GGUF.

Gunakan model Transformers Face Hugging

 from transformers import AutoModelForCausalLM , AutoTokenizer
from cappr . huggingface . classify import predict

model_name = "gpt2"
model = AutoModelForCausalLM . from_pretrained ( model_name )
tokenizer = AutoTokenizer . from_pretrained ( model_name )

prompt = "Which planet is closer to the Sun: Mercury or Earth?"
completions = ( "Mercury" , "Earth" )

pred = predict ( prompt , completions , model_and_tokenizer = ( model , tokenizer ))
print ( pred )
# Mercury

Lihat halaman dokumentasi ini untuk info lebih lanjut tentang penggunaan model transformers .

Instruksi cache untuk menghemat waktu

Banyak prompt dimulai dengan set instruksi yang sama, misalnya, prompt sistem ditambah segelintir pasangan input-output. Alih -alih berulang kali menjalankan model pada instruksi umum, cache mereka sehingga perhitungan di masa depan lebih cepat.

Berikut adalah contoh menggunakan cappr.huggingface.classify.cache_model .

 from transformers import AutoModelForCausalLM , AutoTokenizer
from cappr . huggingface . classify import cache_model , predict

# Load model and tokenizer
model = AutoModelForCausalLM . from_pretrained ( "gpt2" )
tokenizer = AutoTokenizer . from_pretrained ( "gpt2" )
model_and_tokenizer = ( model , tokenizer )

# Create data
prompt_prefix = '''Instructions: complete the sequence.
Here are examples:
A, B, C => D
1, 2, 3 => 4

Complete this sequence:'''

prompts = [ "X, Y =>" , "10, 9, 8 =>" ]
completions = [ "7" , "Z" , "Hi" ]

# Cache prompt_prefix because it's used for all prompts
cached_model_and_tokenizer = cache_model (
    model_and_tokenizer , prompt_prefix
)

# Compute
preds = predict (
    prompts , completions , cached_model_and_tokenizer
)
print ( preds )
# ['Z', '7']

Hitung probabilitas log tingkat token

Berikut adalah contoh menggunakan cappr.huggingface.classify.log_probs_conditional .

 from transformers import AutoModelForCausalLM , AutoTokenizer
from cappr . huggingface . classify import log_probs_conditional

# Load model and tokenizer
model = AutoModelForCausalLM . from_pretrained ( "gpt2" )
tokenizer = AutoTokenizer . from_pretrained ( "gpt2" )

# Create data
prompts = [ "x y" , "a b c" ]
completions = [ "z" , "d e" ]

# Compute
log_probs_completions = log_probs_conditional (
    prompts , completions , model_and_tokenizer = ( model , tokenizer )
)

# Outputs (rounded) next to their symbolic representation

print ( log_probs_completions [ 0 ])
# [[-4.5],        [[log Pr(z | x, y)],
#  [-5.6, -3.2]]   [log Pr(d | x, y),    log Pr(e | x, y, d)]]

print ( log_probs_completions [ 1 ])
# [[-9.7],        [[log Pr(z | a, b, c)],
#  [-0.2, -0.03]]  [log Pr(d | a, b, c), log Pr(e | a, b, c, d)]]

Secara efisien mengumpulkan probabilitas log ini menggunakan cappr.utils.classify.agg_log_probs .

Untuk demo yang sedikit lebih maju, lihat ./demos/huggingface/dpo.ipynb .

Ekstrak jawaban akhir dari penyelesaian langkah demi langkah

Step-by-step dan rantai-dipikirkan adalah cara yang sangat efektif untuk mendapatkan LLM untuk "bernalar" tentang tugas yang lebih kompleks. Tetapi jika Anda memerlukan output terstruktur, penyelesaian langkah demi langkah tidak menyenangkan. Gunakan CAPPR untuk mengekstrak jawaban akhir dari jenis penyelesaian ini, dengan daftar kemungkinan jawaban.

Lihat ide ini beraksi di sini dalam dokumentasi.

Jalankan dalam batch, prediksi probabilitas

 from transformers import AutoModelForCausalLM , AutoTokenizer
from cappr . huggingface . classify import predict_proba

# Load a model and its tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM . from_pretrained ( model_name )
tokenizer = AutoTokenizer . from_pretrained ( model_name )

prompts = [
    "Stephen Curry is a" ,
    "Martina Navratilova was a" ,
    "Dexter, from the TV Series Dexter's Laboratory, is a" ,
    "LeBron James is a" ,
]

# Each of the prompts could be completed with one of these:
class_names = ( "basketball player" , "tennis player" , "scientist" )
prior =       (      1 / 6 ,                1 / 6 ,            2 / 3    )
# Say I expect most of my data to have scientists

# Run CAPPr
pred_probs = predict_proba (
    prompts = prompts ,
    completions = class_names ,
    model_and_tokenizer = ( model , tokenizer ),
    batch_size = 2 ,  # whatever fits on your CPU/GPU
    prior = prior ,
)

# pred_probs[i,j] = probability that prompts[i] is classified as class_names[j]
print ( pred_probs . round ( 1 ))
# [[0.5 0.3 0.2]
#  [0.3 0.6 0.2]
#  [0.1 0.1 0.8]
#  [0.8 0.2 0. ]]

# For each prompt, which completion is most likely?
pred_class_idxs = pred_probs . argmax ( axis = - 1 )
preds = [ class_names [ pred_class_idx ] for pred_class_idx in pred_class_idxs ]
print ( preds )
# ['basketball player',
#  'tennis player',
#  'scientist',
#  'basketball player']

Jalankan dalam batch, di mana setiap prompt memiliki seperangkat kemungkinan penyelesaian yang berbeda

Sekali lagi, mari prediksi probabilitas.

 from transformers import AutoModelForCausalLM , AutoTokenizer
from cappr . huggingface . classify import predict_proba_examples
from cappr import Example

# Load a model and its tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM . from_pretrained ( model_name )
tokenizer = AutoTokenizer . from_pretrained ( model_name )

# Create a sequence of Example objects representing your classification tasks
examples = [
    Example (
        prompt = "Jodie Foster played" ,
        completions = ( "Clarice Starling" , "Trinity in The Matrix" ),
    ),
    Example (
        prompt = "Batman, from Batman: The Animated Series, was played by" ,
        completions = ( "Pete Holmes" , "Kevin Conroy" , "Spongebob!" ),
        prior =      (     1 / 3      ,      2 / 3     ,      0      ),
    ),
]

# Run CAPPr
pred_probs = predict_proba_examples (
    examples , model_and_tokenizer = ( model , tokenizer )
)

# pred_probs[i][j] = probability that examples[i].prompt is classified as
# examples[i].completions[j]
print ([ example_pred_probs . round ( 2 ) for example_pred_probs in pred_probs ])
# [array([0.7, 0.3]),
#  array([0.03, 0.97, 0.  ])]

# For each example, which completion is most likely?
pred_class_idxs = [
    example_pred_probs . argmax () for example_pred_probs in pred_probs
]
preds = [
    example . completions [ pred_class_idx ]
    for example , pred_class_idx in zip ( examples , pred_class_idxs )
]
print ( preds )
# ['Clarice Starling',
#  'Kevin Conroy']

Lihat demos untuk demonstrasi tugas klasifikasi yang sedikit lebih sulit.

Untuk CAPPR, model GPTQ adalah kinerja yang paling komputasi. Model -model ini kompatibel dengan cappr.huggingface.classify . Lihat halaman dokumentasi ini untuk info lebih lanjut tentang penggunaan model ini.

Dokumentasi

https://cappr.readthedocs.io

Instalasi

Lihat halaman dokumentasi ini.

Pekerjaan terkait

Lihat halaman dokumentasi ini.

Motivasi

Mengurangi kompleksitas teknik.

Lihat halaman dokumentasi ini untuk info lebih lanjut.

Pertunjukan

Kinerja statistik

Kinerja komputasi

Cara kerjanya

Anda memasukkan string prompt , string end_of_prompt (whitespace atau kosong) dan satu set string completion kandidat sehingga string—

{ prompt }{ end_of_prompt }{ completion }

—Apakah pemikiran yang mengalir secara alami. Cappr memilih completion yang kemungkinan besar akan mengikuti prompt dengan menghitung—

C ompletion
Setelah
Mengingatkan
Pra oblabilitas

—Sebuah disempurnakan dalam pertanyaan saya tentang salib divalidasi.

Pengembangan Lokal

Lihat halaman dokumentasi ini.

Todo

Saya membuang Todos di sini:

Perubahan kode

Eksperimen Reseach

Jangan ragu untuk mengangkat masalah OFC

Memperluas

Informasi Tambahan

Versi v0.9.6 - fix Llama 3 tokenizer
Tipe Kode Sumber AI
Waktu Pembaruan 2025-07-01
ukuran 1.62MB
Berasal dari Github

Aplikasi Terkait

c ares

2024-11-10
Rencana C

2023-07-06
C mengendarai mobil

2023-06-23
Game seluler kode C

2023-05-31
Kode SC

2023-05-17
anime c婷婷

2023-04-14

Direkomendasikan untuk Anda

chat.petals.dev

Kode sumber lainnya

1.0.0
GPT Prompt Templates

Kode sumber lainnya

1.0.0
GPTyped

Kode sumber lainnya

GPTyped 1.0.5
ML stack

Kode Sumber AI

1.0.0
awesome free chatgpt

Kode Sumber AI

1.0.0
promptl

Kode Sumber AI

1.0.0
Google Dorks

Kode sumber lainnya

1.0
shepherd

Kode sumber lainnya

v6.1.6-react-shepherd: Prepare Release (#3063)
hidusbf

Kode sumber lainnya

1.0.0

Informasi Terkait Semua