DeepRank GNN esm下載DeepRank GNN esm源代碼下載

DeepRank GNN esm

Ai源碼

1.0.0

下載

？存檔筆記

由於DeepRank-GNN不再從事積極發展，因此我們將DeepRank-GNN-ESM版本遷移到Haddocking/DeepRank-Gnn-ESM的新倉庫中。

有關詳細信息，請參閱我們的出版物“ DeepRank-gnn-esm：使用蛋白質語言模型評分蛋白質 - 蛋白質模型的圖形神經網絡”

❄️這個存儲庫現在被冷凍了。 ❄️

Deeprank-gnn-esm

蛋白質蛋白接口的圖形網絡包括語言模型功能

安裝

與Anaconda

克隆存儲庫

git clone https://github.com/DeepRank/DeepRank-GNN-esm.git
cd DeepRank-GNN-esm

安裝DeepRank-GNN-ESM的CPU或GPU版本

conda env create -f environment-cpu.yml && conda activate deeprank-gnn-esm-cpu-env

或者

conda env create -f environment-gpu.yml && conda activate deeprank-gnn-esm-gpu-env

安裝命令行工具

pip install .

運行測試以確保一切正常

pytest tests/

用法

作為評分功能

我們為DeepRank-GNN-ESM提供了一個命令行界面，可用於評分蛋白質 - 蛋白質複合物。命令行接口可用於以下：

usage: deeprank-gnn-esm-predict [-h] pdb_file chain_id_1 chain_id_2

positional arguments:
  pdb_file    Path to the PDB file.
  chain_id_1  First chain ID.
  chain_id_2  Second chain ID.

optional arguments:
  -h, --help  show this help message and exit

例如，為1B6C複合體評分

 # download it
$ wget https://files.rcsb.org/view/1B6C.pdb -q

# make sure the environment is activated
$ conda activate deeprank-gnn-esm-gpu-env
(deeprank-gnn-esm-gpu-env) $ deeprank-gnn-esm-predict 1B6C.pdb A B
 2023-06-28 06:08:21,889 predict:64 INFO - Setting up workspace - /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred_A_B
 2023-06-28 06:08:21,945 predict:72 INFO - Renumbering PDB file.
 2023-06-28 06:08:22,294 predict:104 INFO - Reading sequence of PDB 1B6C.pdb
 2023-06-28 06:08:22,423 predict:131 INFO - Generating embedding for protein sequence.
 2023-06-28 06:08:22,423 predict:132 INFO - # ###############################################################################
 2023-06-28 06:08:32,447 predict:138 INFO - Transferred model to GPU
 2023-06-28 06:08:32,450 predict:147 INFO - Read /home/1B6C-gnn_esm_pred_A_B/all.fasta with 2 sequences
 2023-06-28 06:08:32,459 predict:157 INFO - Processing 1 of 1 batches (2 sequences)
 2023-06-28 06:08:36,462 predict:200 INFO - # ###############################################################################
 2023-06-28 06:08:36,470 predict:205 INFO - Generating graph, using 79 processors
 Graphs added to the HDF5 file
 Embedding added to the /home/1B6C-gnn_esm_pred_A_B/graph.hdf5 file file
 2023-06-28 06:09:03,345 predict:220 INFO - Graph file generated: /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred_A_B/graph.hdf5
 2023-06-28 06:09:03,345 predict:226 INFO - Predicting fnat of protein complex.
 2023-06-28 06:09:03,345 predict:234 INFO - Using device: cuda:0
 # ...
 2023-06-28 06:09:07,794 predict:280 INFO - Predicted fnat for 1B6C between chainA and chainB: 0.359
 2023-06-28 06:09:07,803 predict:290 INFO - Output written to /home/DeepRank-GNN-esm/1B6C-gnn_esm_pred/GNN_esm_prediction.csv

從上方的輸出來看，您可以看到Chaina和Chainb之間1B6C複合物的預測FNAT為0.359 ，此信息也寫入GNN_esm_prediction.csv文件。

上面的命令將在當前工作目錄中生成一個文件夾，其中包含以下內容：

 1B6C-gnn_esm_pred_A_B
├── 1B6C.pdb                   #input pdb file 
├── all.fasta                  #fasta sequence for the pdb input 
├── 1B6C.A.pt                  #esm-2 embedding for chainA in protein 1B6C
├── 1B6C.B.pt                  #esm-2 embedding for chainB in protein 1B6C
├── graph.hdf5                 #input protein graph in hdf5 format 
├── GNN_esm_prediction.hdf5    #prediction output in hdf5 format
└── GNN_esm_prediction.csv     #prediction output in csv format

作為框架

生成蛋白質的ESM-2嵌入

在散裝中生成fasta序列，使用腳本“ get_fasta.py'

usage: get_fasta.py [-h] pdb_dir output_fasta_name

positional arguments:
  pdb_dir            Path to the directory containing PDB files
  output_fasta_name  Name of the combined output FASTA file

options:
  -h, --help         show this help message and exit

從合併的FastA文件中生成批量嵌入，使用ESM-2軟件包中提供的腳本，
```
$ python esm_2_installation_location/scripts/extract.py 
    esm2_t33_650M_UR50D 
    all.fasta 
    tests/data/embedding/1ATN/ 
    --repr_layers 0 32 33 
    --include mean per_tok
```
用上面生成的fastA序列，'esm_2_installation_location'替換安裝位置'all.fasta'，用ESM嵌入式的輸出文件夾名稱，tests/data/embedding/eNbedding/1atn/'

生成圖

示例代碼以HDF5格式生成殘基圖：

 from deeprank_gnn . GraphGenMP import GraphHDF5

pdb_path = "tests/data/pdb/1ATN/"
pssm_path = "tests/data/pssm/1ATN/"
embedding_path = "tests/data/embedding/1ATN/"
nproc = 20
outfile = "1ATN_residue.hdf5"

GraphHDF5 (
    pdb_path = pdb_path ,
    pssm_path = pssm_path ,
    embedding_path = embedding_path ,
    graph_type = "residue" ,
    outfile = outfile ,
    nproc = nproc ,    #number of cores to use
    tmpdir = "./tmpdir" )

示例代碼將連續或二進制目標添加到HDF5文件

 import h5py
import random

hdf5_file = h5py . File ( '1ATN_residue.hdf5' , "r+" )
for mol in hdf5_file . keys ():
    fnat = random . random ()
    bin_class = [ 1 if fnat > 0.3 else 0 ]
    hdf5_file . create_dataset ( f"/ { mol } /score/binclass" , data = bin_class )
    hdf5_file . create_dataset ( f"/ { mol } /score/fnat" , data = fnat )
hdf5_file . close ()

使用預訓練的模型預測

使用預訓練的DeepRank-GNN-ESM模型的示例代碼

 from deeprank_gnn . ginet import GINet
from deeprank_gnn . NeuralNet import NeuralNet

database_test = "1ATN_residue.hdf5"
gnn = GINet
target = "fnat"
edge_attr = [ "dist" ]
threshold = 0.3
pretrained_model = "deeprank-GNN-esm/paper_pretrained_models/scoring_of_docking_models/gnn_esm/treg_yfnat_b64_e20_lr0.001_foldall_esm.pth.tar"
node_feature = [ "type" , "polarity" , "bsa" , "charge" , "embedding" ]
device_name = "cuda:0"
num_workers = 10

model = NeuralNet (
    database_test ,
    gnn ,
    device_name = device_name ,
    edge_feature = edge_attr ,
    node_feature = node_feature ,
    target = target ,
    num_workers = num_workers ,
    pretrained_model = pretrained_model ,
    threshold = threshold )

model . test ( hdf5 = "tmpdir/GNN_esm_prediction.hdf5" )