版权(C)2021 Automl组Freiburg和Hannover
尽管早期的汽车框架着重于优化传统的ML管道及其超参数,但汽车的另一种趋势是专注于神经体系结构搜索。为了将这两个世界中的最好的世界融合在一起,我们开发了自动pytorch ,它们共同且可靠地优化了网络体系结构和培训超参数,以实现完全自动化的深度学习(AUTODL)。
Auto-Pytorch主要是为支持表格数据(分类,回归)和时间序列数据(预测)而开发的。自动图数据中的最新功能用于表格数据中的“自动图形表格:多效金属”,以提高效率且稳健的自动化量”(Bibtex Ref请参见下文)。有关多休性时间序列预测任务的Auto-Pytorch的详细信息,请参见“有效的自动化深度学习时间序列预测”(另请参见Bibtex Ref)。
另外,在此处找到文档。
从v0.1.0开始,使用SMAC作为基础优化软件包以及更改代码结构,已更新AutopyTorch,以进一步提高可用性,鲁棒性和效率。因此,从v0.0.2到v0.1.0将破坏兼容性。如果您想使用旧的API,可以在master_old上找到它。
在下图中绘制了自动pytorch工作流程的粗略描述。
在该图中,数据由用户提供,投资组合是神经网络的一组配置,它们在不同的数据集上效果很好。当前版本仅支持纸张自动图表中所述的贪婪投资组合:多效金属学习效率和稳健的自动化量,该投资组合用于温暖SMAC的优化。换句话说,我们将提供的数据作为初始配置评估投资组合。然后API启动以下过程:
sklearn.dummy的虚拟模型训练每个算法,代表了最糟糕的性能。*1:基准是机器学习算法的预定池,例如LightGBM和支持向量机,用于在提供的数据集中求解回归或分类任务
*2:管道高参数配置指定了每个步骤中的组件的选择,例如目标算法,神经网络的形状,(其中指定每个步骤中的组件选择及其相应的超级标准器)。
pip install autoPyTorch
时间序列预测的自动播种需要其他依赖关系
pip install autoPyTorch[forecasting]
我们建议使用Anaconda进行以下开发:
# Following commands assume the user is in a cloned directory of Auto-Pytorch
# We also need to initialize the automl_common repository as follows
# You can find more information about this here:
# https://github.com/automl/automl_common/
git submodule update --init --recursive
# Create the environment
conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
python setup.py install
同样,要安装所有依赖关系,以进行自动播放:
git submodule update --init --recursive
conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
pip install -e[forecasting]
简而言之:
from autoPyTorch . api . tabular_classification import TabularClassificationTask
# data and metric imports
import sklearn . model_selection
import sklearn . datasets
import sklearn . metrics
X , y = sklearn . datasets . load_digits ( return_X_y = True )
X_train , X_test , y_train , y_test =
sklearn . model_selection . train_test_split ( X , y , random_state = 1 )
# initialise Auto-PyTorch api
api = TabularClassificationTask ()
# Search for an ensemble of machine learning algorithms
api . search (
X_train = X_train ,
y_train = y_train ,
X_test = X_test ,
y_test = y_test ,
optimize_metric = 'accuracy' ,
total_walltime_limit = 300 ,
func_eval_time_limit_secs = 50
)
# Calculate test accuracy
y_pred = api . predict ( X_test )
score = api . score ( y_pred , y_test )
print ( "Accuracy score" , score )时间序列预测任务
from autoPyTorch . api . time_series_forecasting import TimeSeriesForecastingTask
# data and metric imports
from sktime . datasets import load_longley
targets , features = load_longley ()
# define the forecasting horizon
forecasting_horizon = 3
# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [ targets [: - forecasting_horizon ]]
y_test = [ targets [ - forecasting_horizon :]]
# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [ features [: - forecasting_horizon ]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list ( features . columns )
X_test = [ features [ - forecasting_horizon :]]
start_times = [ targets . index . to_timestamp ()[ 0 ]]
freq = '1Y'
# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask ()
# Search for an ensemble of machine learning algorithms
api . search (
X_train = X_train ,
y_train = y_train ,
X_test = X_test ,
optimize_metric = 'mean_MAPE_forecasting' ,
n_prediction_steps = forecasting_horizon ,
memory_limit = 16 * 1024 , # Currently, forecasting models use much more memories
freq = freq ,
start_times = start_times ,
func_eval_time_limit_secs = 50 ,
total_walltime_limit = 60 ,
min_num_test_instances = 1000 , # proxy validation sets. This only works for the tasks with more than 1000 series
known_future_features = known_future_features ,
)
# our dataset could directly generate sequences for new datasets
test_sets = api . dataset . generate_test_seqs ()
# Calculate test accuracy
y_pred = api . predict ( test_sets )
score = api . score ( y_pred , y_test )
print ( "Forecasting score" , score )有关更多示例,包括自定义搜索空间,分解代码等,请查看examples文件夹
$ cd examples/该论文的代码可在TPAMI.2021.3067763分支中的examples/ensemble下找到。
如果您想为Auto-Pytorch做出贡献,请克隆存储库并结帐我们当前的开发分支机构
$ git checkout development该程序是免费的软件:您可以根据Apache许可证2.0的条款对其进行重新分配和/或对其进行修改(请参阅许可证文件)。
该程序的分布是希望它将有用的,但没有任何保修;即使没有对特定目的的适销性或适合性的隐含保证。
您应该已经收到了Apache许可证2.0的副本以及此程序(请参阅许可证文件)。
请参阅分支TPAMI.2021.3067763以复制纸张自动式表格:多效金属学习,以提高自动级。
@article { zimmer-tpami21a ,
author = { Lucas Zimmer and Marius Lindauer and Frank Hutter } ,
title = { Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL } ,
journal = { IEEE Transactions on Pattern Analysis and Machine Intelligence } ,
year = { 2021 } ,
note = { also available under https://arxiv.org/abs/2006.13799 } ,
pages = { 3079 - 3090 }
} @incollection { mendoza-automlbook18a ,
author = { Hector Mendoza and Aaron Klein and Matthias Feurer and Jost Tobias Springenberg and Matthias Urban and Michael Burkart and Max Dippel and Marius Lindauer and Frank Hutter } ,
title = { Towards Automatically-Tuned Deep Neural Networks } ,
year = { 2018 } ,
month = dec,
editor = { Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin } ,
booktitle = { AutoML: Methods, Sytems, Challenges } ,
publisher = { Springer } ,
chapter = { 7 } ,
pages = { 141--156 }
} @article { deng-ecml22 ,
author = { Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer } ,
title = { Efficient Automated Deep Learning for Time Series Forecasting } ,
year = { 2022 } ,
booktitle = { Machine Learning and Knowledge Discovery in Databases. Research Track
- European Conference, {ECML} {PKDD} 2022 } ,
url = { https://doi.org/10.48550/arXiv.2205.05511 } ,
}Auto-Pytorch是由弗莱堡大学和汉诺威大学的汽车集团开发的。