Auto PyTorch下載 - Auto PyTorch源代碼下載

Auto-Pytorch

版權（C）2021 Automl組Freiburg和Hannover

儘管早期的汽車框架著重於優化傳統的ML管道及其超參數，但汽車的另一種趨勢是專注於神經體系結構搜索。為了將這兩個世界中的最好的世界融合在一起，我們開發了自動pytorch ，它們共同且可靠地優化了網絡體系結構和培訓超參數，以實現完全自動化的深度學習（AUTODL）。

Auto-Pytorch主要是為支持表格數據（分類，回歸）和時間序列數據（預測）而開發的。自動圖數據中的最新功能用於表格數據中的“自動圖形表格：多效金屬”，以提高效率且穩健的自動化量”（Bibtex Ref請參見下文）。有關多休性時間序列預測任務的Auto-Pytorch的詳細信息，請參見“有效的自動化深度學習時間序列預測”（另請參見Bibtex Ref）。

另外，在此處找到文檔。

從v0.1.0開始，使用SMAC作為基礎優化軟件包以及更改代碼結構，已更新AutopyTorch，以進一步提高可用性，魯棒性和效率。因此，從v0.0.2到v0.1.0將破壞兼容性。如果您想使用舊的API，可以在master_old上找到它。

工作流程

在下圖中繪製了自動pytorch工作流程的粗略描述。

在該圖中，數據由用戶提供，投資組合是神經網絡的一組配置，它們在不同的數據集上效果很好。當前版本僅支持紙張自動圖表中所述的貪婪投資組合：多效金屬學習效率和穩健的自動化量，該投資組合用於溫暖SMAC的優化。換句話說，我們將提供的數據作為初始配置評估投資組合。然後API啟動以下過程：

驗證輸入數據：處理每種數據類型，例如編碼分類數據，以便可以處理自動數據。
創建數據集：創建一個可以在此API中處理的數據集，可以選擇交叉驗證或保持拆分。
評估基線
- 表格數據集*1：在預定池中使用固定的超參數配置和sklearn.dummy的虛擬模型訓練每個算法，代表了最糟糕的性能。
- 時間序列預測數據集：訓練一個虛擬預測指標，該預測指標重複每個系列中的最後觀察值
SMAC搜索：
一個。確定超級帶的預算和截止規則
b。採樣管道超參數配置 *2由SMAC樣品
c。通過獲得的結果更新觀察結果
d。重複a。 - c。直到預算用完為止
從合奏的觀測值和模型選擇中為提供的數據集建立最佳的合奏。

*1：基準是機器學習算法的預定池，例如LightGBM和支持向量機，用於在提供的數據集中求解回歸或分類任務

*2：管道高參數配置指定了每個步驟中的組件的選擇，例如目標算法，神經網絡的形狀，（其中指定每個步驟中的組件選擇及其相應的超級標準器）。

安裝

PYPI安裝

pip install autoPyTorch

時間序列預測的自動播種需要其他依賴關係

pip install autoPyTorch[forecasting]

手動安裝

我們建議使用Anaconda進行以下開發：

 # Following commands assume the user is in a cloned directory of Auto-Pytorch

# We also need to initialize the automl_common repository as follows
# You can find more information about this here:
# https://github.com/automl/automl_common/
git submodule update --init --recursive

# Create the environment
conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
python setup.py install

同樣，要安裝所有依賴關係，以進行自動播放：

git submodule update --init --recursive

conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
pip install -e[forecasting]

例子

簡而言之：

 from autoPyTorch . api . tabular_classification import TabularClassificationTask

# data and metric imports
import sklearn . model_selection
import sklearn . datasets
import sklearn . metrics
X , y = sklearn . datasets . load_digits ( return_X_y = True )
X_train , X_test , y_train , y_test = 
        sklearn . model_selection . train_test_split ( X , y , random_state = 1 )

# initialise Auto-PyTorch api
api = TabularClassificationTask ()

# Search for an ensemble of machine learning algorithms
api . search (
    X_train = X_train ,
    y_train = y_train ,
    X_test = X_test ,
    y_test = y_test ,
    optimize_metric = 'accuracy' ,
    total_walltime_limit = 300 ,
    func_eval_time_limit_secs = 50
)

# Calculate test accuracy
y_pred = api . predict ( X_test )
score = api . score ( y_pred , y_test )
print ( "Accuracy score" , score )

時間序列預測任務

 from autoPyTorch . api . time_series_forecasting import TimeSeriesForecastingTask

# data and metric imports
from sktime . datasets import load_longley
targets , features = load_longley ()

# define the forecasting horizon
forecasting_horizon = 3

# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the 
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [ targets [: - forecasting_horizon ]]
y_test = [ targets [ - forecasting_horizon :]]

# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [ features [: - forecasting_horizon ]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list ( features . columns )
X_test = [ features [ - forecasting_horizon :]]

start_times = [ targets . index . to_timestamp ()[ 0 ]]
freq = '1Y'

# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask ()

# Search for an ensemble of machine learning algorithms
api . search (
    X_train = X_train ,
    y_train = y_train ,
    X_test = X_test , 
    optimize_metric = 'mean_MAPE_forecasting' ,
    n_prediction_steps = forecasting_horizon ,
    memory_limit = 16 * 1024 ,  # Currently, forecasting models use much more memories
    freq = freq ,
    start_times = start_times ,
    func_eval_time_limit_secs = 50 ,
    total_walltime_limit = 60 ,
    min_num_test_instances = 1000 ,  # proxy validation sets. This only works for the tasks with more than 1000 series
    known_future_features = known_future_features ,
)

# our dataset could directly generate sequences for new datasets
test_sets = api . dataset . generate_test_seqs ()

# Calculate test accuracy
y_pred = api . predict ( test_sets )
score = api . score ( y_pred , y_test )
print ( "Forecasting score" , score )

有關更多示例，包括自定義搜索空間，分解代碼等，請查看examples文件夾

$ cd examples/

該論文的代碼可在TPAMI.2021.3067763分支中的examples/ensemble下找到。

貢獻

如果您想為Auto-Pytorch做出貢獻，請克隆存儲庫並結帳我們當前的開發分支機構

$ git checkout development

執照

該程序是免費的軟件：您可以根據Apache許可證2.0的條款對其進行重新分配和/或對其進行修改（請參閱許可證文件）。

該程序的分佈是希望它將有用的，但沒有任何保修；即使沒有對特定目的的適銷性或適合性的隱含保證。

您應該已經收到了Apache許可證2.0的副本以及此程序（請參閱許可證文件）。

參考

請參閱分支TPAMI.2021.3067763以復製紙張自動式表格：多效金屬學習，以提高自動級。

  @article { zimmer-tpami21a ,
  author = { Lucas Zimmer and Marius Lindauer and Frank Hutter } ,
  title = { Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL } ,
  journal = { IEEE Transactions on Pattern Analysis and Machine Intelligence } ,
  year = { 2021 } ,
  note = { also available under https://arxiv.org/abs/2006.13799 } ,
  pages = { 3079 - 3090 }
}

 @incollection { mendoza-automlbook18a ,
  author    = { Hector Mendoza and Aaron Klein and Matthias Feurer and Jost Tobias Springenberg and Matthias Urban and Michael Burkart and Max Dippel and Marius Lindauer and Frank Hutter } ,
  title     = { Towards Automatically-Tuned Deep Neural Networks } ,
  year      = { 2018 } ,
  month     = dec,
  editor    = { Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin } ,
  booktitle = { AutoML: Methods, Sytems, Challenges } ,
  publisher = { Springer } ,
  chapter   = { 7 } ,
  pages     = { 141--156 }
}

 @article { deng-ecml22 ,
  author    = { Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer } ,
  title     = { Efficient Automated Deep Learning for Time Series Forecasting } ,
  year      = { 2022 } ,
  booktitle = { Machine Learning and Knowledge Discovery in Databases. Research Track
               - European Conference, {ECML} {PKDD} 2022 } ,
  url       = { https://doi.org/10.48550/arXiv.2205.05511 } ,
}