UniDL4BioPep下载UniDL4BioPep源代码下载

UniDL4BioPep

Ai源码

1.0.0

下载

Unidl4biopep

Paper Du，Z.，Ding，X.，Xu，Y。和Li，Y。的实施。生物信息学的简报，BBAD135。网络服务器可在server_link上找到

更新：Xingjian ding发布了不平衡数据集的unidl4biopep-asl的Pytorch版本。该方法采用了另一个损耗函数（不对称损耗函数，焦点损耗函数的修改版本），该函数具有同时进行正面和负面的调音的能力。

2024-11-21 updates: all the datasets used in this study are uploaded into the corresponding folders and label 1 means the positive in this properties (for example: 1 is toxic, 0 is non-toxic; 1 is allergenic and 0 is non-allergenic; 1 is bitter and 0 is non-bitter; 1 is antimicrobial and 0 is non-antimicrobial, etc.) The training and test datasets are following the original dataset我们的转介的划分，如果文件夹中只有一个数据集，则意味着原始转介没有提供数据集除法。除了最初的20个数据集外，我们还添加了另外两个数据集（过敏蛋白和肽以及细胞穿透性肽），这两个模型还使用Unidl4BioPep模型体系结构开发，并在我们的WebServer上使用。

2024-01-06更新：我们将效果信息添加到预测结果中，您将获得主动和非活动性和功能性（0.98），以指示我们的模型预测的概率。让您更容易访问模型的结果。（请上传您的文件，然后进行预测，您将获得新功能）！

2023-07-04更新：我们重新设计了模板文件（预处理_model_usage_template.ipynb）。现在，它可以自动将您的GPU资源用于肽嵌入和模型预测加速度。感谢您在此项目中的任何反馈。

2023-05-07更新：我们添加了一个新设计的模板（gpu_unidl4biopep_template_for_other_other_bioactivity.ipynb）。如果可用，它可以自动识别您的GPU，并将您的GPU用于肽嵌入和型号拟合加速度。另外，将FastA.format文件转换的部分添加到CSV文件。

更新：我们添加了一个高级版本（UNIDL4BIOPEP-FL），该版本使用不平衡的数据集使用焦点损失函数，并为您的使用模板（unidl4biopep_fl_fl_template_for_other_other_bioactivity.ipynb）添加了模板。

用于不平衡数据集的Unidl4BioPep-FL使用：请选择您的少数群体作为一个正组（标记为1），而多数组为负组（标记为0）；高参数调谐的建议：伽玛（0,1,2,3,4,5）和pos_weight（，0.1,0.2，... 1.0）或无需指定pos_weight。

注意：该模型也可以用于多类分类（我们在最后一个输出层采用SoftMax函数），因此您只需更改输出层节点编号即可。（请随时与我联系或在问题部分中提交您的问题。）

更新：具有高级26型号的Web服务器可在https://nepc2pvmzy.us-east-1.awsapprunner.com/上找到； WebServer开发存储库可在unidl4biopep_weberver上获得。

注意：unidl4biopep仅免费用于学术研究；有关商业用法，请与我们联系，[email protected]； [email protected]; [email protected];

如果内容对您有用，请友善地将其引用并引用它。请引用为： DU，Z.，Ding，X.，Xu，Y。，＆Li，Y。（2023）.unidl4biopep：一种用于肽生物活性二进制分类的通用深度学习体系结构。生物信息学的简报，BBAD135。

要求

该项目中使用的Majoy依赖项如下：

 Python 3.8.16
fair-esm 2.0.0
keras 2.9.0
pandas 1.3.5
numpy 1.21.6
scikit-learn 1.0.2
tensorflow 2.9.2
torch 1.13.0+cu116
focal-loss

该项目中使用的更详细的python库被称为requirements.txt 。所有的实现都可以在Google Colab中降低，而您所需要的只是一个浏览器和一个Google帐户。安装上述所有软件包by !pip install package_name==2.0.0

用法

注意：我所有的数据集使用0和1分别表示正面和负面。同样，0是正，1是负的。

为您自己的数据集使用验证的模型

只需检查预告片的文件_model_usage_template.ipynb

您需要的只是在Google colab中使用XLSX格式文件准备数据以进行XLSX格式文件的预测，并在Google colab中打开预处理_model_usage_template.ipynb 。然后上传数据并培训数据集（用于模型培训）。那你准备就绪了。

用unidl4biopep培训自己的模型

您需要做的就是以XLSX格式准备数据库和两列（第一列是序列，第二列是标签）。您只需从此存储库中的任何文件夹下载XLSX格式数据集文件即可。在加载数据集之前，请随时洗牌，并将其作为火车数据集和测试数据集拆分为您的要求。

您还可以使用以下代码中使用Python代码中的拆分数据集，然后可以再重新校底加载和嵌入式部分。只需用以下代码替换该部分即可。

更新：我在unidl4biopep_template_for_other_bioactivity.ipynb中添加了一个新部分，以适合您一个XLSX格式数据集合数据集加载和嵌入（只需使用它）。

 import numpy as np
import pandas as pd
# whole dataset loading and dataset splitting 
dataset = pd.read_excel('whole_sample_dataset.xlsx',na_filter = False) # take care the NA sequence problem

# generate the peptide embeddings
sequence_list = dataset['sequence'] 
embeddings_results = pd.DataFrame()
for seq in sequence_list:
    format_seq = [seq,seq] # the setting is just following the input format setting in ESM model, [name,sequence]
    tuple_sequence = tuple(format_seq)
    peptide_sequence_list = []
    peptide_sequence_list.append(tuple_sequence) # build a summarize list variable including all the sequence information
    # employ ESM model for converting and save the converted data in csv format
    one_seq_embeddings = esm_embeddings(peptide_sequence_list)
    embeddings_results= pd.concat([embeddings_results,one_seq_embeddings])
embeddings_results.to_csv('whole_sample_dataset_esm2_t6_8M_UR50D_unified_320_dimension.csv')

# loading the y dataset for model development 
y = dataset['label']
y = np.array(y) # transformed as np.array for CNN model

# read the peptide embeddings
X_data_name = 'whole_sample_dataset_esm2_t6_8M_UR50D_unified_320_dimension.csv'
X_data = pd.read_csv(X_data_name,header=0, index_col = 0,delimiter=',')
X = np.array(X_data)

# split dataset as training and test dataset as ratio of 8:2
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=123)

# normalize the X data range
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train) # normalize X to 0-1 range 
X_test = scaler.transform(X_test)

转化后，你们都准备好了。注意：在运行错误的情况下，请检查您的数据集维度。

 # check the dimension of the dataset before model development
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

进一步的模型调整和修改

随时进行个性化修改。只需向下滚动到模型架构部分，然后进行修订以适合您的期望。

在我的实验中，这种体系结构似乎相当不错，如果需要，您可能需要做出重大更改才能做出不同的事情。

展开

附加信息

版本 1.0.0
类型 Ai源码
更新时间 2025-09-07
大小 171.78MB
来自于 Github

UniDL4BioPep

Unidl4biopep

要求

用法

为您自己的数据集使用验证的模型

用unidl4biopep培训自己的模型

进一步的模型调整和修改

ML stack

awesome free chatgpt

pywin_contextmenu

promptl

tick.chat

FastLoRAChat

chat.petals.dev

GPT Prompt Templates

GPTyped

ML stack

awesome free chatgpt

pywin_contextmenu

Google Dorks

shepherd

mongo express