
CBGBENCH:複雜的結合圖基準是生成目標感知分子設計的基準。
紙[difbp] |紙[D3FG] |基準
筆記
這是本文“ CBGBENCH:填寫蛋白質 - 分子綁定圖的空白”的官方代碼存儲庫,該圖旨在通過單個代碼實現將目標感知分子設計統一。到目前為止,我們包含了7種方法,如下所示:
| 模型 | 紙鏈接 | github |
|---|---|---|
| Pocket2mol | https://arxiv.org/abs/2205.07249 | https://github.com/pengxingang/pocket2mol |
| GraphBP | https://arxiv.org/abs/2204.09410 | https://github.com/divelab/graphbp |
| diffsbdd | https://arxiv.org/abs/2210.13695 | https://github.com/arneschneuing/diffsbdd |
| diffbp | https://arxiv.org/abs/2211.11214 | 這是官方實施。 |
| Targetdiff | https://arxiv.org/abs/2303.03543 | https://github.com/guanjq/targetdiff |
| 旗幟 | https://openreview.net/forum?id=rq13IDF0F73 | https://github.com/zaixizhang/flag |
| D3FG | https://arxiv.org/abs/2306.13769 | 這是官方實施。 |
這些模型最初是為de novo生成的,我們擴展了更多任務,包括linker design , fragment growing , scaffold hopping和side chain decoration 。

conda env create -f environment.yml
conda activate cbgbench
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pyg pytorch-scatter pytorch-cluster -c pyg
# install rdkit, efgs, obabel, etc.
pip install --use-pep517 EFGs
pip install biopython
pip install lxml
conda install rdkit openbabel tensorboard tqdm pyyaml easydict python-lmdb -c conda-forge
# install plip
mkdir tools
cd tools
git clone https://github.com/pharmai/plip.git
cd plip
python setup.py install
alias plip='python plip/plip/plipcmd.py'
cd ..
### Note that if there is an error in setup.py, it can be ignored as long as openbabel is installed.
# install docking tools
conda install -c conda-forge numpy swig boost-cpp sphinx sphinx_rtd_theme
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
pip install meeko==0.1.dev3 scipy pdb2pqr vina
# If you are unable to install vina, you can try: conda install vina
# If you encounter the following error:
# ImportError: libtiff.so.5: cannot open shared object file: No such file or directory
# Please try the following steps to resolve it:
pip uninstall pillow
pip install pillow
(i)從Google Drive下載處理的數據集。對於每個數據集,它對應於不同的方法和任務。請參閱下表以查找您需要使用的已處理數據。請注意,要訓練D3FG,它是一個兩階段的模型,首先生成函數組然後生成鏈接原子,因此我們命名了兩個階段的模型“ D3FG_FG”和“ D3FG_LINKER”,而訓練數據對他們而言是不同的。
表:與不同方法和任務相對應的數據文件名。
| 模型 | 任務 | 文件名 |
|---|---|---|
| Pocket2mol | 從頭開始 | data/pl/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb |
| GraphBP | 從頭開始 | data/pl/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb |
| diffsbdd | 從頭開始 | data/pl/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb |
| diffbp | 從頭開始 | data/pl/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb |
| Targetdiff | 從頭開始 | data/pl/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb |
| 旗幟 | 從頭開始 | data/pl_arfg/crossdocked_v1.1_rmsd1.0_pocket10_processed_arfuncgroup.lmdb |
| D3FG_FG | 從頭開始 | data/pl_fg/crossdocked_v1.1_rmsd1.0_pocket10_processed_funcgroup.lmdb |
| d3fg_linker | 從頭開始 | data/pl/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb |
| Pocket2mol | 分段 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb |
| GraphBP | 分段 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb |
| diffsbdd | 分段 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb |
| diffbp | 分段 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb |
| Targetdiff | 分段 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb |
| Pocket2mol | 鏈接器 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb |
| GraphBP | 鏈接器 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb |
| diffsbdd | 鏈接器 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb |
| diffbp | 鏈接器 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb |
| Targetdiff | 鏈接器 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb |
| Pocket2mol | 腳手架 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb |
| GraphBP | 腳手架 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb |
| diffsbdd | 腳手架 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb |
| diffbp | 腳手架 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb |
| Targetdiff | 腳手架 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb |
| Pocket2mol | 側鏈 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb |
| GraphBP | 側鏈 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb |
| diffsbdd | 側鏈 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb |
| diffbp | 側鏈 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb |
| Targetdiff | 側鏈 | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb |
(ii)將數據集複製到./data dir,其中完整的數據文件目錄看起來像
- CGBBench
- data
- pl
- crossdocked_name2id.pt
- crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb
- crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb
- pl_arfg
- crossdocked_name2id_arfuncgroup.pt
- crossdocked_v1.1_rmsd1.0_pocket10_processed_arfuncgroup.lmdb
- pl_decomp
- crossdocked_name2id_frag.pt
- crossdocked_name2id_linker.pt
- crossdocked_name2id_scaffold.pt
- crossdocked_name2id_sidechain.pt
- crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb
- crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb
- crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb
- crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb
- pl_fg
- crossdocked_name2id_funcgroup.pt
- crossdocked_v1.1_rmsd1.0_pocket10_processed_funcgroup.lmdb
(i)從targetdiff drive下載crossdocked_v1.1_rmsd1.0.tar.gz ,然後將其複製到./raw_data/使用
mkdir raw_data
tar -xzvf crossdocked_v1.1_rmsd1.0.tar.gz ./raw_data
(ii)運行以下內容:
python ./scripts/extract_pockets.py --source raw_data/crossdocked_v1.1_rmsd1.0 --dest raw_data/crossdocked_v1.1_rmsd1.0_pocket10
(iii)在培訓中,將為每個任務準備數據集。請參考Training 。
另外,您可以從Google Drive下載file case_study的過程,然後將其複製到./data目錄,該目錄將導致DIR到
- CGBBench
- data
- case_study
- processed
- case_study_processed_fullatom.lmdb
- case_study_processed_funcgroup.lmdb
- case_study_name2id.pt
...
python train.py --config ./configs/{task}/train/{method}.yml --logdir ./logs/{task}/{method}
{任務}可以用denovo , linker , frag , scaffold和sidechain代替,並且{Method}可以用模型名稱代替。下表提供了詳細的方法任務對和替換。
表:用於從頭開始訓練的方法任務對。
| 方法 | 任務 | {方法} + {任務} |
|---|---|---|
| Pocket2mol | 從頭開始 | Pocket2mol + Denovo |
| GraphBP | 從頭開始 | GraphBP + Denovo |
| diffsbdd | 從頭開始 | DIFFSBDD + DENOVO |
| diffbp | 從頭開始 | diffbp + denovo |
| Targetdiff | 從頭開始 | Targetdiff + Denovo |
| 旗幟 | 從頭開始 | 國旗 + denovo |
| D3FG | 從頭開始 | d3fg_fg + denovo; d3fg_linker + denovo |
| Pocket2mol | 鏈接器設計 | Pocket2mol + Linker |
| GraphBP | 鏈接器設計 | GraphBP + Linker |
| diffsbdd | 鏈接器設計 | diffsbdd +鏈接器 |
| diffbp | 鏈接器設計 | DIFFBP +鏈接器 |
| Targetdiff | 鏈接器設計 | TargetDiff + Linker |
| Pocket2mol | 碎片生長 | Pocket2mol + Frag |
| GraphBP | 碎片生長 | GraphBP + Frag |
| diffsbdd | 碎片生長 | diffsbdd + frag |
| diffbp | 碎片生長 | diffbp + frag |
| Targetdiff | 碎片生長 | TargetDiff + Frag |
| Pocket2mol | 腳手架跳 | Pocket2mol +腳手架 |
| GraphBP | 腳手架跳 | GraphBP +支架 |
| diffsbdd | 腳手架跳 | diffsbdd +腳手架 |
| diffbp | 腳手架跳 | diffbp +支架 |
| Targetdiff | 腳手架跳 | TargetDiff +腳手架 |
| Pocket2mol | 側鏈裝飾 | Pocket2mol + Sidechain |
| GraphBP | 側鏈裝飾 | GraphBP + Sidechain |
| diffsbdd | 側鏈裝飾 | diffsbdd + sidechain |
| diffbp | 側鏈裝飾 | diffbp + sidechain |
| Targetdiff | 側鏈裝飾 | TargetDiff + Sidechain |
請注意,D3FG和FLAG與擴展任務不兼容,並且D3FG使用2階段訓練策略,因此,如果您想自己培訓D3FG,則需要運行:
python train.py --config ./configs/denovo/train/d3fg_fg.yml --logdir ./logs/denovo/d3fg_fg
python train.py --config ./configs/denovo/train/d3fg_linker.yml --logdir ./logs/denovo/d3fg_linker
Todo:帶有預告片模型的桌子
下載所需的驗證的.pt檢查點,然後將它們複製到./logs/中,這將導致以下目錄結構:
# TODO
- CBGBench
- logs
- denovo
- d3fg_fg
- pretrain
- checkpoints
- 951000.pt
- d3fg_linker
- pretrain
- checkpoints
- 4840000.pt
- diffbp
- pretrain
- checkpoints
- 4848000.pt
...
- frag
- diffbp
- pretrain
- checkpoints
- 476000.pt
...
...
訓練模型後,您可以在測試口袋上從它們中繪製樣品,並進行以下內容:
bash generate.sh --method {method} --task {task} --tag {tag} --checkpoint {ckpt_number}
在命令中,{method}和{task}對可以在Method-Task對的前表中找到。 {tag}應根據您使用的檢查站替換為selftrain或pretrain 。如果沒有數字提供-CheckPoint參數,它將自動找到最新的.pt文件。如果提供了一個數字,它將使用指定的檢查點文件。
例如,如果要使用自訓練的TargetDiff模型在測試集上生成樣本,則可以運行:
bash generate.sh --method targetdiff --task denovo --tag selftrain --checkpoint
或者如果要測試100000-3的檢查點。
bash generate.sh --method targetdiff --task denovo --tag selftrain --checkpoint 100000
請注意, D3FG使用兩步生成策略,因此該命令應為
bash generate.sh --method d3fg_fg --task denovo --tag {tag} --checkpoint # generate the functional groups
bash generate.sh --method d3fg_linker --task denovo --tag {tag} --checkpoint # generate the linkers
運行以下內容:
cd evaluate_scripts
bash evaluate.sh --method {method} --task {task} --tag {tag}
例如,如果您自己訓練了TargetDiff,請運行
bash evaluate.sh --method targetdiff --task denovo --tag selftrain
bash evaluate.sh --method targetdiff --task frag --tag selftrain
bash evaluate.sh --method targetdiff --task linker --tag selftrain
bash evaluate.sh --method targetdiff --task scaffold --tag selftrain
bash evaluate.sh --method targetdiff --task sidechain --tag selftrain
或者,如果您已經下載了驗證的型號,請運行
cd evaluate_scripts
bash bash evaluate.sh --method {method} --task {task} --tag pretrained
使用驗證的模型靈活地在現實世界目標上產生分子。
包括更多模型,例如3DSBDD,Depompdiff,Voxmol和Ligan。
發布預審預告額的模型,一旦發表論文,測試可行性和再現性。
代碼庫是由Haitao Lin初始化的,並以CBGbench V1作為我們的動機。隨後的代碼存儲庫的運行將由Gujiang Zhao及其DP Technology團隊管理。
如果您發現存儲庫有助於您的研究或項目,請引用我們的基準論文作為參考:
@misc{lin2024cbgbench,
title={CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph},
author={Haitao Lin and Guojiang Zhao and Odin Zhang and Yufei Huang and Lirong Wu and Zicheng Liu and Siyuan Li and Cheng Tan and Zhifeng Gao and Stan Z. Li},
year={2024},
eprint={2406.10840},
archivePrefix={arXiv},
}
以及我們以前的DiffBP和D3FG的作品:
@misc{lin2024diffbpgenerativediffusion3d,
title={DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding},
author={Haitao Lin and Yufei Huang and Odin Zhang and Siqi Ma and Meng Liu and Xuanjing Li and Lirong Wu and Jishui Wang and Tingjun Hou and Stan Z. Li},
year={2024},
eprint={2211.11214},
archivePrefix={arXiv},
primaryClass={q-bio.BM},
url={https://arxiv.org/abs/2211.11214},
}
@inproceedings{
lin2023functionalgroupbased,
title={Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration},
author={Haitao Lin and Yufei Huang and Odin Zhang and Yunfan Liu and Lirong Wu and Siyuan Li and Zhiyuan Chen and Stan Z. Li},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=lRG11M91dx}
}
該項目是根據GPL-3.0許可證的條款許可的。