DeepRank GNN esm 다운로드 -DeepRank DeepRank GNN esm 소스 코드 다운로드

DeepRank GNN esm

AI 소스 코드

1.0.0

다운로드

? 보관 노트

DeepRank-GNN은 더 이상 활발한 개발에 있지 않기 때문에 DeepRank-GNN-ESM 버전을 Haddocking/DeepRank-GNN-ESM의 새로운 Repo로 마이그레이션했습니다.

자세한 내용은 "https://academic.oup.com/bioinformaticsadvances/article/4/1/vbad191/7511844에서"Deeprank-GNN-ESM : 단백질 언어 모델을 사용하여 단백질-단백질 모델을 스코어링하기위한 그래프 신경망 "을 참조하십시오.

❄️이 저장소는 이제 동결되었습니다. ❄️

DeepRank-GNN-ESM

언어 모델 기능을 포함한 단백질 단백질 인터페이스의 그래프 네트워크

설치

아나콘다와 함께

저장소를 복제하십시오

git clone https://github.com/DeepRank/DeepRank-GNN-esm.git
cd DeepRank-GNN-esm

DeepRank-GNN-ESM의 CPU 또는 GPU 버전을 설치하십시오

conda env create -f environment-cpu.yml && conda activate deeprank-gnn-esm-cpu-env

또는

conda env create -f environment-gpu.yml && conda activate deeprank-gnn-esm-gpu-env

명령 줄 도구를 설치하십시오

pip install .

모든 것이 작동하는지 확인하기 위해 테스트를 실행하십시오

pytest tests/

용법

점수 기능으로

우리는 단백질-단백질 복합체를 점수하는 데 사용할 수있는 DeepRank-GNN-ESM에 대한 명령 줄 인터페이스를 제공합니다. 명령 줄 인터페이스는 다음과 같이 사용할 수 있습니다.

usage: deeprank-gnn-esm-predict [-h] pdb_file chain_id_1 chain_id_2

positional arguments:
  pdb_file    Path to the PDB file.
  chain_id_1  First chain ID.
  chain_id_2  Second chain ID.

optional arguments:
  -h, --help  show this help message and exit

예를 들어, 1B6C 복합체를 점수를 얻으십시오

 # download it
$ wget https://files.rcsb.org/view/1B6C.pdb -q

# make sure the environment is activated
$ conda activate deeprank-gnn-esm-gpu-env
(deeprank-gnn-esm-gpu-env) $ deeprank-gnn-esm-predict 1B6C.pdb A B
 2023-06-28 06:08:21,889 predict:64 INFO - Setting up workspace - /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred_A_B
 2023-06-28 06:08:21,945 predict:72 INFO - Renumbering PDB file.
 2023-06-28 06:08:22,294 predict:104 INFO - Reading sequence of PDB 1B6C.pdb
 2023-06-28 06:08:22,423 predict:131 INFO - Generating embedding for protein sequence.
 2023-06-28 06:08:22,423 predict:132 INFO - # ###############################################################################
 2023-06-28 06:08:32,447 predict:138 INFO - Transferred model to GPU
 2023-06-28 06:08:32,450 predict:147 INFO - Read /home/1B6C-gnn_esm_pred_A_B/all.fasta with 2 sequences
 2023-06-28 06:08:32,459 predict:157 INFO - Processing 1 of 1 batches (2 sequences)
 2023-06-28 06:08:36,462 predict:200 INFO - # ###############################################################################
 2023-06-28 06:08:36,470 predict:205 INFO - Generating graph, using 79 processors
 Graphs added to the HDF5 file
 Embedding added to the /home/1B6C-gnn_esm_pred_A_B/graph.hdf5 file file
 2023-06-28 06:09:03,345 predict:220 INFO - Graph file generated: /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred_A_B/graph.hdf5
 2023-06-28 06:09:03,345 predict:226 INFO - Predicting fnat of protein complex.
 2023-06-28 06:09:03,345 predict:234 INFO - Using device: cuda:0
 # ...
 2023-06-28 06:09:07,794 predict:280 INFO - Predicted fnat for 1B6C between chainA and chainB: 0.359
 2023-06-28 06:09:07,803 predict:290 INFO - Output written to /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred/GNN_esm_prediction.csv

GNN_esm_prediction.csv 의 출력에서 Chaina와 Chainb 사이의 1B6C 복합체에 대한 예측 된 FNAT는 0.359 임을 알 수 있습니다.

위의 명령은 다음을 포함하는 현재 작업 디렉토리에서 폴더를 생성합니다.

 1B6C-gnn_esm_pred_A_B
├── 1B6C.pdb                   #input pdb file 
├── all.fasta                  #fasta sequence for the pdb input 
├── 1B6C.A.pt                  #esm-2 embedding for chainA in protein 1B6C
├── 1B6C.B.pt                  #esm-2 embedding for chainB in protein 1B6C
├── graph.hdf5                 #input protein graph in hdf5 format 
├── GNN_esm_prediction.hdf5    #prediction output in hdf5 format
└── GNN_esm_prediction.csv     #prediction output in csv format

프레임 워크로

단백질에 대한 ESM-2 임베딩을 생성하십시오

대량으로 Fasta 시퀀스를 생성하고 스크립트 'get_fasta.py'를 사용하십시오.

usage: get_fasta.py [-h] pdb_dir output_fasta_name

positional arguments:
  pdb_dir            Path to the directory containing PDB files
  output_fasta_name  Name of the combined output FASTA file

options:
  -h, --help         show this help message and exit

결합 된 Fasta 파일에서 대량으로 포함 된 내장을 생성하고 ESM-2 패키지 내부에서 제공된 스크립트를 사용하십시오.
```
$ python esm_2_installation_location/scripts/extract.py 
    esm2_t33_650M_UR50D 
    all.fasta 
    tests/data/embedding/1ATN/ 
    --repr_layers 0 32 33 
    --include mean per_tok
```
'ESM_2_INSTALLATION_LOCATE'를 설치 위치 'All.Fasta'위에 생성 된 FASTA 시퀀스로 'Test/Data/Enbedding/1ATN/'로 바꾸십시오.

그래프를 생성하십시오

HDF5 형식으로 잔류 물 그래프를 생성하기위한 코드 예제 :

 from deeprank_gnn . GraphGenMP import GraphHDF5

pdb_path = "tests/data/pdb/1ATN/"
pssm_path = "tests/data/pssm/1ATN/"
embedding_path = "tests/data/embedding/1ATN/"
nproc = 20
outfile = "1ATN_residue.hdf5"

GraphHDF5 (
    pdb_path = pdb_path ,
    pssm_path = pssm_path ,
    embedding_path = embedding_path ,
    graph_type = "residue" ,
    outfile = outfile ,
    nproc = nproc ,    #number of cores to use
    tmpdir = "./tmpdir" )

CODE 예제 코드 HDF5 파일에 연속 또는 이진 대상을 추가합니다.

 import h5py
import random

hdf5_file = h5py . File ( '1ATN_residue.hdf5' , "r+" )
for mol in hdf5_file . keys ():
    fnat = random . random ()
    bin_class = [ 1 if fnat > 0.3 else 0 ]
    hdf5_file . create_dataset ( f"/ { mol } /score/binclass" , data = bin_class )
    hdf5_file . create_dataset ( f"/ { mol } /score/fnat" , data = fnat )
hdf5_file . close ()

미리 훈련 된 모델을 사용하여 예측하십시오

예제 코드 미리 훈련 된 DeepRank-GNN-ESM 모델을 사용하십시오

 from deeprank_gnn . ginet import GINet
from deeprank_gnn . NeuralNet import NeuralNet

database_test = "1ATN_residue.hdf5"
gnn = GINet
target = "fnat"
edge_attr = [ "dist" ]
threshold = 0.3
pretrained_model = "deeprank-GNN-esm/paper_pretrained_models/scoring_of_docking_models/gnn_esm/treg_yfnat_b64_e20_lr0.001_foldall_esm.pth.tar"
node_feature = [ "type" , "polarity" , "bsa" , "charge" , "embedding" ]
device_name = "cuda:0"
num_workers = 10

model = NeuralNet (
    database_test ,
    gnn ,
    device_name = device_name ,
    edge_feature = edge_attr ,
    node_feature = node_feature ,
    target = target ,
    num_workers = num_workers ,
    pretrained_model = pretrained_model ,
    threshold = threshold )

model . test ( hdf5 = "tmpdir/GNN_esm_prediction.hdf5" )