ดาวน์โหลด Auto PyTorch - Auto PyTorch Source Source Download

ผู้พิทักษ์อัตโนมัติ

ในขณะที่เฟรมเวิร์ก AutomL ต้นมุ่งเน้นไปที่การเพิ่มประสิทธิภาพท่อส่ง ML แบบดั้งเดิมและพารามิเตอร์ไฮเปอร์พารามิเตอร์ของพวกเขาแนวโน้มอีกอย่างหนึ่งใน AutomL คือการมุ่งเน้นไปที่การค้นหาสถาปัตยกรรมประสาท เพื่อนำสิ่งที่ดีที่สุดของโลกทั้งสองนี้มารวมกันเราได้พัฒนา Auto-Pytorch ซึ่งร่วมกันและเพิ่มประสิทธิภาพสถาปัตยกรรมเครือข่ายและพารามิเตอร์การฝึกอบรมเพื่อเปิดใช้งานการเรียนรู้เชิงลึกอัตโนมัติอย่างเต็มที่ (AutoDL)

Auto-Pytorch ส่วนใหญ่ได้รับการพัฒนาเพื่อรองรับข้อมูลแบบตาราง (การจำแนกการถดถอย) และข้อมูลอนุกรมเวลา (การพยากรณ์) คุณสมบัติใหม่ล่าสุดใน Auto-Pytorch สำหรับข้อมูลแบบตารางอธิบายไว้ในกระดาษ "Auto-Pytorch Tabular: Multi-Fidelity MetaleArning สำหรับ AutoDL ที่มีประสิทธิภาพและมีประสิทธิภาพ" (ดูด้านล่างสำหรับ BIBTEX REF) รายละเอียดเกี่ยวกับ Auto-Pytorch สำหรับงานการพยากรณ์เวลาหลายฮอร์ริซันทัลสามารถพบได้ในกระดาษ

นอกจากนี้ค้นหาเอกสารที่นี่

จาก V0.1.0 AutopyTorch ได้รับการปรับปรุงเพื่อปรับปรุงการใช้งานความทนทานและประสิทธิภาพโดยการใช้ SMAC เป็นแพ็คเกจการเพิ่มประสิทธิภาพพื้นฐานรวมถึงการเปลี่ยนโครงสร้างรหัส ดังนั้นการย้ายจาก v0.0.2 เป็น v0.1.0 จะทำลายความเข้ากันได้ ในกรณีที่คุณต้องการใช้ API เก่าคุณสามารถค้นหาได้ที่ master_old

เวิร์กโฟลว์

คำอธิบายคร่าวๆของเวิร์กโฟลว์ของ Auto-Pytorch ถูกวาดในรูปต่อไปนี้

ในรูปที่มี ข้อมูล จากผู้ใช้และ พอร์ตโฟลิโอ เป็นชุดของการกำหนดค่าเครือข่ายประสาทที่ทำงานได้ดีในชุดข้อมูลที่หลากหลาย เวอร์ชันปัจจุบันรองรับ พอร์ตโฟลิโอโลเดอร์ ตามที่อธิบายไว้ในกระดาษ Auto-Pytorch Tabular: Multi-Fidelity MetaleArning สำหรับ AutoDL ที่มีประสิทธิภาพและมีประสิทธิภาพ พอร์ตโฟลิโอนี้ใช้ในการเริ่มต้นการเพิ่มประสิทธิภาพของ SMAC กล่าวอีกนัยหนึ่งเราประเมินพอร์ตโฟลิโอบนข้อมูลที่ให้ไว้เป็นการกำหนดค่าเริ่มต้น จากนั้น API จะเริ่มขั้นตอนต่อไปนี้:

ตรวจสอบข้อมูลอินพุต : ประมวลผลแต่ละประเภทข้อมูลเช่นการเข้ารหัสข้อมูลหมวดหมู่เพื่อให้สามารถจัดการอัตโนมัติอัตโนมัติ
สร้างชุดข้อมูล : สร้างชุดข้อมูลที่สามารถจัดการได้ใน API นี้ด้วยตัวเลือกการตรวจสอบความถูกต้องข้ามหรือการแยกการถือครอง
ประเมินพื้นฐาน
- ชุดข้อมูลแบบตาราง *1: ฝึกอัลกอริทึมแต่ละตัวในพูลที่กำหนดไว้ล่วงหน้าด้วยการกำหนดค่าไฮเปอร์พารามิเตอร์คงที่และโมเดลจำลองจาก sklearn.dummy ซึ่งแสดงถึงประสิทธิภาพที่เลวร้ายที่สุดที่เป็นไปได้
- ชุดข้อมูลการพยากรณ์อนุกรมเวลา : ฝึกตัวทำนายหุ่นจำลองที่ทำซ้ำค่าที่สังเกตได้สุดท้ายในแต่ละชุด
ค้นหาโดย SMAC :
. กำหนดงบประมาณและกฎการตัดโดย hyperband
ข. ตัวอย่างการกำหนดค่าพารามิเตอร์ hyperparameter ไปป์ไลน์ *2 โดย SMAC
ค. อัปเดตการสังเกตโดยผลลัพธ์ที่ได้รับ
d. ทำซ้ำ - c. จนกว่างบประมาณจะหมดลง
สร้างชุดที่ดีที่สุดสำหรับชุดข้อมูลที่ให้มาจากการสังเกตและการเลือกแบบจำลองของวงดนตรี

*1: baselines เป็นกลุ่มอัลกอริทึมการเรียนรู้ของเครื่องที่กำหนดไว้ล่วงหน้าเช่น LightGBM และสนับสนุนเครื่องเวกเตอร์เพื่อแก้ปัญหาการถดถอยหรือการจำแนกประเภทในชุดข้อมูลที่ให้ไว้

*2: การกำหนดค่า Pipeline Hyperparameter ระบุตัวเลือกของส่วนประกอบเช่นอัลกอริทึมเป้าหมายรูปร่างของเครือข่ายประสาทในแต่ละขั้นตอนและ (ซึ่งระบุตัวเลือกส่วนประกอบในแต่ละขั้นตอนและพารามิเตอร์ที่สอดคล้องกัน

การติดตั้ง

การติดตั้ง PYPI

pip install autoPyTorch

Auto-Pytorch สำหรับการพยากรณ์อนุกรมเวลาต้องมีการพึ่งพาเพิ่มเติม

pip install autoPyTorch[forecasting]

การติดตั้งด้วยตนเอง

เราขอแนะนำให้ใช้ Anaconda เพื่อพัฒนาดังนี้:

 # Following commands assume the user is in a cloned directory of Auto-Pytorch

# We also need to initialize the automl_common repository as follows
# You can find more information about this here:
# https://github.com/automl/automl_common/
git submodule update --init --recursive

# Create the environment
conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
python setup.py install

ในทำนองเดียวกันในการติดตั้งการพึ่งพาทั้งหมดสำหรับอัตโนมัติ pytorch-timeseriesforecasting:

git submodule update --init --recursive

conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
pip install -e[forecasting]

ตัวอย่าง

สั้น:

 from autoPyTorch . api . tabular_classification import TabularClassificationTask

# data and metric imports
import sklearn . model_selection
import sklearn . datasets
import sklearn . metrics
X , y = sklearn . datasets . load_digits ( return_X_y = True )
X_train , X_test , y_train , y_test = 
        sklearn . model_selection . train_test_split ( X , y , random_state = 1 )

# initialise Auto-PyTorch api
api = TabularClassificationTask ()

# Search for an ensemble of machine learning algorithms
api . search (
    X_train = X_train ,
    y_train = y_train ,
    X_test = X_test ,
    y_test = y_test ,
    optimize_metric = 'accuracy' ,
    total_walltime_limit = 300 ,
    func_eval_time_limit_secs = 50
)

# Calculate test accuracy
y_pred = api . predict ( X_test )
score = api . score ( y_pred , y_test )
print ( "Accuracy score" , score )

สำหรับงานพยากรณ์อนุกรมเวลา

 from autoPyTorch . api . time_series_forecasting import TimeSeriesForecastingTask

# data and metric imports
from sktime . datasets import load_longley
targets , features = load_longley ()

# define the forecasting horizon
forecasting_horizon = 3

# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the 
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [ targets [: - forecasting_horizon ]]
y_test = [ targets [ - forecasting_horizon :]]

# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [ features [: - forecasting_horizon ]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list ( features . columns )
X_test = [ features [ - forecasting_horizon :]]

start_times = [ targets . index . to_timestamp ()[ 0 ]]
freq = '1Y'

# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask ()

# Search for an ensemble of machine learning algorithms
api . search (
    X_train = X_train ,
    y_train = y_train ,
    X_test = X_test , 
    optimize_metric = 'mean_MAPE_forecasting' ,
    n_prediction_steps = forecasting_horizon ,
    memory_limit = 16 * 1024 ,  # Currently, forecasting models use much more memories
    freq = freq ,
    start_times = start_times ,
    func_eval_time_limit_secs = 50 ,
    total_walltime_limit = 60 ,
    min_num_test_instances = 1000 ,  # proxy validation sets. This only works for the tasks with more than 1000 series
    known_future_features = known_future_features ,
)

# our dataset could directly generate sequences for new datasets
test_sets = api . dataset . generate_test_seqs ()

# Calculate test accuracy
y_pred = api . predict ( test_sets )
score = api . score ( y_pred , y_test )
print ( "Forecasting score" , score )

สำหรับตัวอย่างเพิ่มเติมรวมถึงการปรับแต่งพื้นที่การค้นหาการกำหนดรหัส ฯลฯ การชำระเงินโฟลเดอร์ examples

$ cd examples/

รหัสสำหรับกระดาษมีอยู่ภายใต้ examples/ensemble ในสาขา TPAMI.2021.3067763 สาขา

การบริจาค

หากคุณต้องการมีส่วนร่วมใน Auto-Pytorch ให้โคลนที่เก็บและชำระเงินสาขาการพัฒนาปัจจุบันของเรา

$ git checkout development

ใบอนุญาต

โปรแกรมนี้เป็นซอฟต์แวร์ฟรี: คุณสามารถแจกจ่ายใหม่และ/หรือแก้ไขภายใต้ข้อกำหนดของ Apache License 2.0 (โปรดดูไฟล์ใบอนุญาต)

โปรแกรมนี้มีการแจกจ่ายด้วยความหวังว่าจะมีประโยชน์ แต่ไม่มีการรับประกันใด ๆ โดยไม่มีการรับประกันโดยนัยเกี่ยวกับความสามารถในการค้าหรือความเหมาะสมสำหรับวัตถุประสงค์เฉพาะ

คุณควรได้รับสำเนา Apache License 2.0 พร้อมกับโปรแกรมนี้ (ดูไฟล์ใบอนุญาต)

อ้างอิง

โปรดดูสาขา TPAMI.2021.3067763 เพื่อทำซ้ำ Auto-Pytorch Tabular: Multi-Fidelity MetaleArning สำหรับ AutoDL ที่มีประสิทธิภาพและแข็งแกร่ง

  @article { zimmer-tpami21a ,
  author = { Lucas Zimmer and Marius Lindauer and Frank Hutter } ,
  title = { Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL } ,
  journal = { IEEE Transactions on Pattern Analysis and Machine Intelligence } ,
  year = { 2021 } ,
  note = { also available under https://arxiv.org/abs/2006.13799 } ,
  pages = { 3079 - 3090 }
}

 @incollection { mendoza-automlbook18a ,
  author    = { Hector Mendoza and Aaron Klein and Matthias Feurer and Jost Tobias Springenberg and Matthias Urban and Michael Burkart and Max Dippel and Marius Lindauer and Frank Hutter } ,
  title     = { Towards Automatically-Tuned Deep Neural Networks } ,
  year      = { 2018 } ,
  month     = dec,
  editor    = { Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin } ,
  booktitle = { AutoML: Methods, Sytems, Challenges } ,
  publisher = { Springer } ,
  chapter   = { 7 } ,
  pages     = { 141--156 }
}

 @article { deng-ecml22 ,
  author    = { Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer } ,
  title     = { Efficient Automated Deep Learning for Time Series Forecasting } ,
  year      = { 2022 } ,
  booktitle = { Machine Learning and Knowledge Discovery in Databases. Research Track
               - European Conference, {ECML} {PKDD} 2022 } ,
  url       = { https://doi.org/10.48550/arXiv.2205.05511 } ,
}