Descarga de Progen - Descarga del código fuente Progen

Progen

Otro código fuente

1.0.0

Descargar

Multimodalidad

Progenio

Implementación de progenio en Pytorch, del documento "Progen: modelado de lenguaje para la generación de proteínas"

GPT para secuencias de proteínas

Enlace de papel

Apreciación

Lucidrains
Agorianos

Instalar

pip install progen-torch

Uso

 import torch
from progen . model import ProGen

x = torch . randint ( 0 , 100 , ( 1 , 1024 ))

# Initialize the model with specific parameters
model = ProGen (
    num_tokens = 100 ,  # The size of the vocabulary
    dim = 512 ,  # The dimension of the embeddings
    seq_len = 1024 ,  # The length of the sequences
    depth = 6 ,  # The number of layers in the model
    window_size = 256 ,  # The size of the window for local attention
    global_mlp_depth = 2 ,  # The depth of the MLP in the global attention mechanism
    heads = 8 ,  # The number of attention heads
    dim_head = 512 ,  # The dimension of each attention head
    ff_mult = 4 ,  # The multiplier for the feed-forward network's hidden layer size
    ff_glu = True ,  # Whether to use a GLU activation in the feed-forward network
    attn_dim = None ,  # The dimension of the attention mechanism (None means it defaults to `dim`)
    clamp_gate = True ,  # Whether to clamp the gate values in the GLU activation
    shift_tokens = True ,  # Whether to shift the tokens for the causal attention mechanism
    dropout = 0.1 ,  # The dropout rate
)

# Forward pass through the model
logits = model ( x )

# The output is the logits for each token in the vocabulary, for each position in the input sequences
# Shape: (batch_size, sequence_length, num_tokens)
print ( logits . shape )  # Should print: torch.Size([1, 1024, 100])

Estrategia de conjunto de datos

Aquí hay una tabla de los conjuntos de datos utilizados en el documento con metadatos y enlaces de origen:

Conjunto de datos	Descripción	Fuente
Uniparc	Contiene secuencias de proteínas de varias fuentes	https://www.uniprot.org/uniparc/
Uniprotkb	Contiene secuencias y anotaciones de proteínas	https://www.uniprot.org/uniprot/
Protuberancia suiza	Base de datos de secuencia de proteínas curada	https://www.uniprot.org/swiss-prot/
Trembl	Secuencias de proteínas anotadas por computadora	https://www.uniprot.org/trembl/
Pfam	Base de datos de familias de proteínas	https://pfam.xfam.org/
Taxonomía de NCBI	Clasificación taxonómica de organismos	https://www.ncbi.nlm.nih.gov/taxonomy

Aquí hay un diagrama que muestra el flujo de preprocesamiento de datos:

 Gráfico TD
    A [uniparc] -> b [filtrar y fusionar]
    C [uniprotkb] -> b
    D [Swiss-Prot]-> Be [Trembl]-> B
    F [PFAM] -> B
    G [Taxonomía de NCBI] -> B
    B -> H [Train/Test Split]
    H -> I [Conjunto de tren]
    H -> J [conjunto de pruebas de identificación] 
    H -> K [conjunto de pruebas ood]

Los conjuntos de datos de taxonomía UNIPARC, UNIPROTKB, SWISS-PROT, TremBL, PFAM y NCBI se filtran y fusionan en el paso B. El conjunto de datos agregado se divide en la prueba, la prueba de distribución y los conjuntos de pruebas fuera de distribución en el paso H.

Licencia

MIT

Citas

Expandir

Información adicional

Versión 1.0.0
Tipo Otro código fuente
Fecha de actualización 2025-03-08
tamaño 212.98KB
Proviene de Github

Aplicaciones relacionadas

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
Google Dorks

Otro código fuente

1.0
shepherd

Otro código fuente

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Otro código fuente

v1.1.0-rc-3
Google Dorks

Otro código fuente

1.0
shepherd

Otro código fuente

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Otro código fuente

v1.1.0-rc-3

Información relacionada Todo