UniDL4BioPep下載UniDL4BioPep源代碼下載

UniDL4BioPep

Ai源碼

1.0.0

下載

Unidl4biopep

Paper Du，Z.，Ding，X.，Xu，Y。和Li，Y。的實施。生物信息學的簡報，BBAD135。網絡服務器可在server_link上找到

更新：Xingjian ding發布了不平衡數據集的unidl4biopep-asl的Pytorch版本。該方法採用了另一個損耗函數（不對稱損耗函數，焦點損耗函數的修改版本），該函數具有同時進行正面和負面的調音的能力。

2024-11-21 updates: all the datasets used in this study are uploaded into the corresponding folders and label 1 means the positive in this properties (for example: 1 is toxic, 0 is non-toxic; 1 is allergenic and 0 is non-allergenic; 1 is bitter and 0 is non-bitter; 1 is antimicrobial and 0 is non-antimicrobial, etc.) The training and test datasets are following the original dataset我們的轉介的劃分，如果文件夾中只有一個數據集，則意味著原始轉介沒有提供數據集除法。除了最初的20個數據集外，我們還添加了另外兩個數據集（過敏蛋白和肽以及細胞穿透性肽），這兩個模型還使用Unidl4BioPep模型體系結構開發，並在我們的WebServer上使用。

2024-01-06更新：我們將效果信息添加到預測結果中，您將獲得主動和非活動性和功能性（0.98），以指示我們的模型預測的概率。讓您更容易訪問模型的結果。（請上傳您的文件，然後進行預測，您將獲得新功能）！

2023-07-04更新：我們重新設計了模板文件（預處理_model_usage_template.ipynb）。現在，它可以自動將您的GPU資源用於肽嵌入和模型預測加速度。感謝您在此項目中的任何反饋。

2023-05-07更新：我們添加了一個新設計的模板（gpu_unidl4biopep_template_for_other_other_bioactivity.ipynb）。如果可用，它可以自動識別您的GPU，並將您的GPU用於肽嵌入和型號擬合加速度。另外，將FastA.format文件轉換的部分添加到CSV文件。

更新：我們添加了一個高級版本（UNIDL4BIOPEP-FL），該版本使用不平衡的數據集使用焦點損失函數，並為您的使用模板（unidl4biopep_fl_fl_template_for_other_other_bioactivity.ipynb）添加了模板。

用於不平衡數據集的Unidl4BioPep-FL使用：請選擇您的少數群體作為一個正組（標記為1），而多數組為負組（標記為0）；高參數調諧的建議：伽瑪（0,1,2,3,4,5）和pos_weight（，0.1,0.2，... 1.0）或無需指定pos_weight。

注意：該模型也可以用於多類分類（我們在最後一個輸出層採用SoftMax函數），因此您只需更改輸出層節點編號即可。（請隨時與我聯繫或在問題部分中提交您的問題。）

更新：具有高級26型號的Web服務器可在https://nepc2pvmzy.us-east-1.awsapprunner.com/上找到； WebServer開發存儲庫可在unidl4biopep_weberver上獲得。

注意：unidl4biopep僅免費用於學術研究；有關商業用法，請與我們聯繫，[email protected]； [email protected]; [email protected];

如果內容對您有用，請友善地將其引用並引用它。請引用為： DU，Z.，Ding，X.，Xu，Y。，＆Li，Y。（2023）.unidl4biopep：一種用於肽生物活性二進制分類的通用深度學習體系結構。生物信息學的簡報，BBAD135。

要求

該項目中使用的Majoy依賴項如下：

 Python 3.8.16
fair-esm 2.0.0
keras 2.9.0
pandas 1.3.5
numpy 1.21.6
scikit-learn 1.0.2
tensorflow 2.9.2
torch 1.13.0+cu116
focal-loss

該項目中使用的更詳細的python庫被稱為requirements.txt 。所有的實現都可以在Google Colab中降低，而您所需要的只是一個瀏覽器和一個Google帳戶。安裝上述所有軟件包by !pip install package_name==2.0.0

用法

注意：我所有的數據集使用0和1分別表示正面和負面。同樣，0是正，1是負的。

為您自己的數據集使用驗證的模型

只需檢查預告片的文件_model_usage_template.ipynb

您需要的只是在Google colab中使用XLSX格式文件準備數據以進行XLSX格式文件的預測，並在Google colab中打開預處理_model_usage_template.ipynb 。然後上傳數據並培訓數據集（用於模型培訓）。那你準備就緒了。

用unidl4biopep培訓自己的模型

您需要做的就是以XLSX格式準備數據庫和兩列（第一列是序列，第二列是標籤）。您只需從此存儲庫中的任何文件夾下載XLSX格式數據集文件即可。在加載數據集之前，請隨時洗牌，並將其作為火車數據集和測試數據集拆分為您的要求。

您還可以使用以下代碼中使用Python代碼中的拆分數據集，然後可以再重新校底加載和嵌入式部分。只需用以下代碼替換該部分即可。

更新：我在unidl4biopep_template_for_other_bioactivity.ipynb中添加了一個新部分，以適合您一個XLSX格式數據集合數據集加載和嵌入（只需使用它）。

 import numpy as np
import pandas as pd
# whole dataset loading and dataset splitting 
dataset = pd.read_excel('whole_sample_dataset.xlsx',na_filter = False) # take care the NA sequence problem

# generate the peptide embeddings
sequence_list = dataset['sequence'] 
embeddings_results = pd.DataFrame()
for seq in sequence_list:
    format_seq = [seq,seq] # the setting is just following the input format setting in ESM model, [name,sequence]
    tuple_sequence = tuple(format_seq)
    peptide_sequence_list = []
    peptide_sequence_list.append(tuple_sequence) # build a summarize list variable including all the sequence information
    # employ ESM model for converting and save the converted data in csv format
    one_seq_embeddings = esm_embeddings(peptide_sequence_list)
    embeddings_results= pd.concat([embeddings_results,one_seq_embeddings])
embeddings_results.to_csv('whole_sample_dataset_esm2_t6_8M_UR50D_unified_320_dimension.csv')

# loading the y dataset for model development 
y = dataset['label']
y = np.array(y) # transformed as np.array for CNN model

# read the peptide embeddings
X_data_name = 'whole_sample_dataset_esm2_t6_8M_UR50D_unified_320_dimension.csv'
X_data = pd.read_csv(X_data_name,header=0, index_col = 0,delimiter=',')
X = np.array(X_data)

# split dataset as training and test dataset as ratio of 8:2
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=123)

# normalize the X data range
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train) # normalize X to 0-1 range 
X_test = scaler.transform(X_test)

轉化後，你們都準備好了。注意：在運行錯誤的情況下，請檢查您的數據集維度。

 # check the dimension of the dataset before model development
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

進一步的模型調整和修改

隨時進行個性化修改。只需向下滾動到模型架構部分，然後進行修訂以適合您的期望。

在我的實驗中，這種體系結構似乎相當不錯，如果需要，您可能需要做出重大更改才能做出不同的事情。

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-09-07
大小 171.78MB
來自於 Github

相關應用

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部