Auto PyTorchダウンロードAuto PyTorchソースコードのダウンロード

Auto-Pytorch

初期のAutomlフレームワークは、従来のMLパイプラインとそのハイパーパラメーターの最適化に焦点を当てていましたが、Automlのもう1つの傾向は、神経アーキテクチャの検索に焦点を当てることです。これらの2つの世界の最善をまとめるために、自動パイトーチを開発しました。これは、完全に自動化されたディープラーニング（AUTODL）を可能にするために、ネットワークアーキテクチャとトレーニングハイパーパラメーターを共同で堅牢に最適化しました。

Auto-Pytorchは、主に表形式データ（分類、回帰）および時系列データ（予測）をサポートするために開発されています。表形式データ用のAuto-Pytorchの最新の機能については、「Auto-Pytorch Tabular：Multi-Fidelity MetalEarning for Efficience and Robust AutoDL」という論文で説明されています（Bibtex Refについては以下を参照）。多等層の時系列予測タスクの自動ピトーチの詳細については、「時系列予測のための効率的な自動化された深い学習」という論文にあります（Bibtex Refについては以下も参照）。

また、ここでドキュメントを見つけてください。

V0.1.0から、AutoPytorchは、SMACを基礎となる最適化パッケージとして使用し、コード構造を変更することにより、ユーザビリティ、堅牢性、効率をさらに向上させるために更新されました。したがって、v0.0.2からv0.1.0に移動すると、互換性が破壊されます。古いAPIを使用したい場合は、 master_oldで見つけることができます。

ワークフロー

Auto-Pytorchのワークフローの大まかな説明は、次の図に描かれています。

図では、データはユーザーによって提供され、ポートフォリオは多様なデータセットでうまく機能するニューラルネットワークの構成のセットです。現在のバージョンは、Paper Auto-Pytorch Tabularで説明されている貪欲なポートフォリオのみをサポートしています。効率的で堅牢なautoDLのための多忠実度メタリーラーニングこのポートフォリオは、SMACの最適化をウォームスタートするために使用されます。つまり、提供されたデータのポートフォリオを初期構成として評価します。次に、APIは次の手順を開始します。

入力データの検証：各データ型、たとえばカテゴリデータのエンコードを処理して、Auto-Pytorchが処理できるようにします。
データセットの作成：クロス検証またはホールドアウトスプリットを選択して、このAPIで処理できるデータセットを作成します。
ベースラインを評価します
- 表形式データセット*1：最悪のパフォーマンスを表すsklearn.dummyの固定ハイパーパラメーター構成とダミーモデルで、事前定義されたプールの各アルゴリズムをトレーニングします。
- 時系列予測データセット：各シリーズで最後に観察された値を繰り返すダミー予測子を訓練する
SMACによる検索：
a。ハイパーバンドで予算とカットオフルールを決定します
b。 SMACによるパイプラインハイパーパラメーター構成 *2をサンプリングします
c。得られた結果によって観測値を更新します
d。繰り返します。 - c。予算がなくなるまで
アンサンブルの観測とモデル選択から提供されたデータセットに最適なアンサンブルを構築します。

*1：ベースラインは、提供されたデータセットの回帰または分類タスクのいずれかを解決するために、lightGBMやサポートベクターマシンなどの機械学習アルゴリズムの定義済みのプールです。

*2：パイプラインハイパーパラメーター構成は、各ステップのターゲットアルゴリズム、ニューラルネットワークの形状などのコンポーネントの選択を指定します（各ステップとそれらの対応するハイパーパラメーターのコンポーネントの選択を指定します。

インストール

PYPIインストール

pip install autoPyTorch

時系列予測のためのAuto-Pytorchには、追加の依存関係が必要です

pip install autoPyTorch[forecasting]

手動インストール

次のように開発するためにアナコンダを使用することをお勧めします。

 # Following commands assume the user is in a cloned directory of Auto-Pytorch

# We also need to initialize the automl_common repository as follows
# You can find more information about this here:
# https://github.com/automl/automl_common/
git submodule update --init --recursive

# Create the environment
conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
python setup.py install

同様に、Auto-Pytorch-TimeseriesForeCastingのすべての依存関係をインストールするには：

git submodule update --init --recursive

conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
pip install -e[forecasting]

例

一言で言えば：

 from autoPyTorch . api . tabular_classification import TabularClassificationTask

# data and metric imports
import sklearn . model_selection
import sklearn . datasets
import sklearn . metrics
X , y = sklearn . datasets . load_digits ( return_X_y = True )
X_train , X_test , y_train , y_test = 
        sklearn . model_selection . train_test_split ( X , y , random_state = 1 )

# initialise Auto-PyTorch api
api = TabularClassificationTask ()

# Search for an ensemble of machine learning algorithms
api . search (
    X_train = X_train ,
    y_train = y_train ,
    X_test = X_test ,
    y_test = y_test ,
    optimize_metric = 'accuracy' ,
    total_walltime_limit = 300 ,
    func_eval_time_limit_secs = 50
)

# Calculate test accuracy
y_pred = api . predict ( X_test )
score = api . score ( y_pred , y_test )
print ( "Accuracy score" , score )

時系列予測タスクの場合

 from autoPyTorch . api . time_series_forecasting import TimeSeriesForecastingTask

# data and metric imports
from sktime . datasets import load_longley
targets , features = load_longley ()

# define the forecasting horizon
forecasting_horizon = 3

# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the 
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [ targets [: - forecasting_horizon ]]
y_test = [ targets [ - forecasting_horizon :]]

# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [ features [: - forecasting_horizon ]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list ( features . columns )
X_test = [ features [ - forecasting_horizon :]]

start_times = [ targets . index . to_timestamp ()[ 0 ]]
freq = '1Y'

# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask ()

# Search for an ensemble of machine learning algorithms
api . search (
    X_train = X_train ,
    y_train = y_train ,
    X_test = X_test , 
    optimize_metric = 'mean_MAPE_forecasting' ,
    n_prediction_steps = forecasting_horizon ,
    memory_limit = 16 * 1024 ,  # Currently, forecasting models use much more memories
    freq = freq ,
    start_times = start_times ,
    func_eval_time_limit_secs = 50 ,
    total_walltime_limit = 60 ,
    min_num_test_instances = 1000 ,  # proxy validation sets. This only works for the tasks with more than 1000 series
    known_future_features = known_future_features ,
)

# our dataset could directly generate sequences for new datasets
test_sets = api . dataset . generate_test_seqs ()

# Calculate test accuracy
y_pred = api . predict ( test_sets )
score = api . score ( y_pred , y_test )
print ( "Forecasting score" , score )

検索スペースのカスタマイズ、コードのパレレリングなどを含むその他の例については、 examplesフォルダーのチェックアウト

$ cd examples/

論文のコードは、TPAMI.2021.3067763ブランチのexamples/ensembleの下で入手できます。

貢献

Auto-Pytorchに貢献したい場合は、リポジトリをクローンして、現在の開発ブランチをチェックアウトします

$ git checkout development

ライセンス

このプログラムはフリーソフトウェアです。Apacheライセンス2.0の条件の下で再配布したり、変更したりできます（ライセンスファイルを参照してください）。

このプログラムは、それが有用であることを期待して配布されますが、保証はありません。商品性や特定の目的に対するフィットネスの暗黙の保証さえありません。

このプログラムとともに、Apacheライセンス2.0のコピーを受け取る必要があります（ライセンスファイルを参照）。

参照

Branch TPAMI.2021.3067763を参照して、紙を再現して、Auto-Pytorch Tabularを再現してください。

  @article { zimmer-tpami21a ,
  author = { Lucas Zimmer and Marius Lindauer and Frank Hutter } ,
  title = { Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL } ,
  journal = { IEEE Transactions on Pattern Analysis and Machine Intelligence } ,
  year = { 2021 } ,
  note = { also available under https://arxiv.org/abs/2006.13799 } ,
  pages = { 3079 - 3090 }
}

 @incollection { mendoza-automlbook18a ,
  author    = { Hector Mendoza and Aaron Klein and Matthias Feurer and Jost Tobias Springenberg and Matthias Urban and Michael Burkart and Max Dippel and Marius Lindauer and Frank Hutter } ,
  title     = { Towards Automatically-Tuned Deep Neural Networks } ,
  year      = { 2018 } ,
  month     = dec,
  editor    = { Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin } ,
  booktitle = { AutoML: Methods, Sytems, Challenges } ,
  publisher = { Springer } ,
  chapter   = { 7 } ,
  pages     = { 141--156 }
}

 @article { deng-ecml22 ,
  author    = { Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer } ,
  title     = { Efficient Automated Deep Learning for Time Series Forecasting } ,
  year      = { 2022 } ,
  booktitle = { Machine Learning and Knowledge Discovery in Databases. Research Track
               - European Conference, {ECML} {PKDD} 2022 } ,
  url       = { https://doi.org/10.48550/arXiv.2205.05511 } ,
}