ดาวน์โหลด Progen - ดาวน์โหลดซอร์สโค้ด Progen

Progen

ซอร์สโค้ดอื่น ๆ

1.0.0

ดาวน์โหลด

ความหลากหลาย

ลูกหลาน

การดำเนินการของลูกหลานใน Pytorch จากกระดาษ "Progen: การสร้างแบบจำลองภาษาสำหรับการสร้างโปรตีน"

GPT สำหรับลำดับโปรตีน

ลิงค์กระดาษ

ความกตัญญู

Lucidrains
Agorians

ติดตั้ง

pip install progen-torch

การใช้งาน

 import torch
from progen . model import ProGen

x = torch . randint ( 0 , 100 , ( 1 , 1024 ))

# Initialize the model with specific parameters
model = ProGen (
    num_tokens = 100 ,  # The size of the vocabulary
    dim = 512 ,  # The dimension of the embeddings
    seq_len = 1024 ,  # The length of the sequences
    depth = 6 ,  # The number of layers in the model
    window_size = 256 ,  # The size of the window for local attention
    global_mlp_depth = 2 ,  # The depth of the MLP in the global attention mechanism
    heads = 8 ,  # The number of attention heads
    dim_head = 512 ,  # The dimension of each attention head
    ff_mult = 4 ,  # The multiplier for the feed-forward network's hidden layer size
    ff_glu = True ,  # Whether to use a GLU activation in the feed-forward network
    attn_dim = None ,  # The dimension of the attention mechanism (None means it defaults to `dim`)
    clamp_gate = True ,  # Whether to clamp the gate values in the GLU activation
    shift_tokens = True ,  # Whether to shift the tokens for the causal attention mechanism
    dropout = 0.1 ,  # The dropout rate
)

# Forward pass through the model
logits = model ( x )

# The output is the logits for each token in the vocabulary, for each position in the input sequences
# Shape: (batch_size, sequence_length, num_tokens)
print ( logits . shape )  # Should print: torch.Size([1, 1024, 100])

กลยุทธ์ชุดข้อมูล

นี่คือตารางของชุดข้อมูลที่ใช้ในกระดาษที่มีข้อมูลเมตาและลิงค์แหล่งที่มา:

ชุดข้อมูล	คำอธิบาย	แหล่งที่มา
uniparc	มีลำดับโปรตีนจากแหล่งต่าง ๆ	https://www.uniprot.org/uniparc/
uniprotkb	มีลำดับโปรตีนและคำอธิบายประกอบ	https://www.uniprot.org/uniprot/
สวิส	ฐานข้อมูลลำดับโปรตีน	https://www.uniprot.org/swiss-prot/
ตัวสั่น	ลำดับโปรตีนที่บันทึกย่อคอมพิวเตอร์	https://www.uniprot.org/trembl/
PFAM	ฐานข้อมูลตระกูลโปรตีน	https://pfam.xfam.org/
อนุกรมวิธาน NCBI	การจำแนกประเภทอนุกรมวิธานของสิ่งมีชีวิต	https://www.ncbi.nlm.nih.gov/taxonomy

นี่คือไดอะแกรมที่แสดงกระแสการประมวลผลข้อมูลล่วงหน้า:

 กราฟ TD
    A [uniparc] -> b [ตัวกรองและผสาน]
    c [uniprotkb] -> b
    D [Swiss-Prot]-> เป็น [trembl]-> b
    f [pfam] -> b
    g [ncbi taxonomy] -> b
    B -> H [Train/Test Split]
    h -> ฉัน [ชุดรถไฟ]
    H -> J [ชุดทดสอบ ID] 
    H -> K [ชุดทดสอบ OOD]

ชุดข้อมูล Texonomy ของ Swiss-Prot, Trembl, PFAM และ NCBI และ NCBI จะถูกกรองและรวมกันในขั้นตอน B ชุดข้อมูลที่รวมจะถูกแบ่งออกเป็นการฝึกอบรมการทดสอบการกระจายและการทดสอบนอกการกระจายในขั้นตอน H

ใบอนุญาต

มิกซ์

การอ้างอิง

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท ซอร์สโค้ดอื่น ๆ
เวลาอัปเดต 2025-03-08
ขนาด 212.98KB
มาจาก Github

แอปที่เกี่ยวข้อง

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด