TransmogrifAI
0.7.0
TransMogrifai(發音為Trăns-Mŏgˈrə-fī)是用Scala編寫的自動庫,它在Apache Spark上運行。它的開發是通過機器學習自動化來加速機器學習開發人員的生產力,以及可以執行編譯時類型安全,模塊化和重複使用的API。通過自動化,它可以實現幾乎降低時間的手工調整模型的精度。
如果需要機器學習庫,請使用TransMogrifai以:
要了解Transmogrifai的動機,請查看以下內容:
跳過快速啟動和文檔。
泰坦尼克號數據集是機器學習社區中經常引用的數據集。目的是建立一個機器學習的模型,該模型將從泰坦尼克號乘客清單中預測倖存者。這是您將如何使用TransMogrifai構建模型:
import com . salesforce . op . _
import com . salesforce . op . readers . _
import com . salesforce . op . features . _
import com . salesforce . op . features . types . _
import com . salesforce . op . stages . impl . classification . _
import org . apache . spark . SparkConf
import org . apache . spark . sql . SparkSession
implicit val spark = SparkSession .builder.config( new SparkConf ()).getOrCreate()
import spark . implicits . _
// Read Titanic data as a DataFrame
val passengersData = DataReaders . Simple .csvCase[ Passenger ](path = pathToData).readDataset().toDF()
// Extract response and predictor Features
val (survived, predictors) = FeatureBuilder .fromDataFrame[ RealNN ](passengersData, response = " survived " )
// Automated feature engineering
val featureVector = predictors.transmogrify()
// Automated feature validation and selection
val checkedFeatures = survived.sanityCheck(featureVector, removeBadFeatures = true )
// Automated model selection
val pred = BinaryClassificationModelSelector ().setInput(survived, checkedFeatures).getOutput()
// Setting up a TransmogrifAI workflow and training the model
val model = new OpWorkflow ().setInputDataset(passengersData).setResultFeatures(pred).train()
println( " Model summary: n " + model.summaryPretty())模型摘要:
Evaluated Logistic Regression, Random Forest models with 3 folds and AuPR metric.
Evaluated 3 Logistic Regression models with AuPR between [0.6751930383321765, 0.7768725281794376]
Evaluated 16 Random Forest models with AuPR between [0.7781671467343991, 0.8104798040316159]
Selected model Random Forest classifier with parameters:
|-----------------------|--------------|
| Model Param | Value |
|-----------------------|--------------|
| modelType | RandomForest |
| featureSubsetStrategy | auto |
| impurity | gini |
| maxBins | 32 |
| maxDepth | 12 |
| minInfoGain | 0.001 |
| minInstancesPerNode | 10 |
| numTrees | 50 |
| subsamplingRate | 1.0 |
|-----------------------|--------------|
Model evaluation metrics:
|-------------|--------------------|---------------------|
| Metric Name | Hold Out Set Value | Training Set Value |
|-------------|--------------------|---------------------|
| Precision | 0.85 | 0.773851590106007 |
| Recall | 0.6538461538461539 | 0.6930379746835443 |
| F1 | 0.7391304347826088 | 0.7312186978297163 |
| AuROC | 0.8821603927986905 | 0.8766642291593114 |
| AuPR | 0.8225075757571668 | 0.850331080886535 |
| Error | 0.1643835616438356 | 0.19682151589242053 |
| TP | 17.0 | 219.0 |
| TN | 44.0 | 438.0 |
| FP | 3.0 | 64.0 |
| FN | 9.0 | 97.0 |
|-------------|--------------------|---------------------|
Top model insights computed using correlation:
|-----------------------|----------------------|
| Top Positive Insights | Correlation |
|-----------------------|----------------------|
| sex = "female" | 0.5177801026737666 |
| cabin = "OTHER" | 0.3331391338844782 |
| pClass = 1 | 0.3059642953159715 |
|-----------------------|----------------------|
| Top Negative Insights | Correlation |
|-----------------------|----------------------|
| sex = "male" | -0.5100301587292186 |
| pClass = 3 | -0.5075774968534326 |
| cabin = null | -0.31463114463832633 |
|-----------------------|----------------------|
Top model insights computed using CramersV:
|-----------------------|----------------------|
| Top Insights | CramersV |
|-----------------------|----------------------|
| sex | 0.525557139885501 |
| embarked | 0.31582347194683386 |
| age | 0.21582347194683386 |
|-----------------------|----------------------|
儘管這似乎有點太神奇了,但對於那些想要更多控制的人來說,TransMogrifai還提供了靈活性,可以完全指定所提取的所有功能以及在ML管道中應用的所有算法。訪問我們的文檔網站以獲取完整的文檔,入門,示例,常見問題解答和其他信息。
您只需將TransMogrifai作為現有項目的常規依賴性添加。首先從下面的版本矩陣中選擇TransMogrifai版本以匹配您的項目依賴項(如果不確定 -穩定版本):
| TransMogrifai版本 | 火花版 | Scala版本 | Java版本 |
|---|---|---|---|
| 0.7.1(未發行,主), 0.7.0(穩定) | 2.4 | 2.11 | 1.8 |
| 0.6.1,0.6.0,0.5.3,0.5.2,0.5.1,0.5.0 | 2.3 | 2.11 | 1.8 |
| 0.4.0,0.3.4 | 2.2 | 2.11 | 1.8 |
對於build.gradle添加的Gradle:
repositories {
jcenter()
mavenCentral()
}
dependencies {
// TransmogrifAI core dependency
compile ' com.salesforce.transmogrifai:transmogrifai-core_2.11:0.7.0 '
// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)
// compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.7.0'
}對於build.sbt中的SBT添加:
scalaVersion : = " 2.11.12 "
resolvers + = Resolver .jcenterRepo
// TransmogrifAI core dependency
libraryDependencies + = " com.salesforce.transmogrifai " %% " transmogrifai-core " % " 0.7.0 "
// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)
// libraryDependencies += "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.7.0"然後將transmogrifai導入您的代碼:
// TransmogrifAI functionality: feature types, feature builders, feature dsl, readers, aggregators etc.
import com . salesforce . op . _
import com . salesforce . op . aggregators . _
import com . salesforce . op . features . _
import com . salesforce . op . features . types . _
import com . salesforce . op . readers . _
// Spark enrichments (optional)
import com . salesforce . op . utils . spark . RichDataset . _
import com . salesforce . op . utils . spark . RichRDD . _
import com . salesforce . op . utils . spark . RichRow . _
import com . salesforce . op . utils . spark . RichMetadata . _
import com . salesforce . op . utils . spark . RichStructType . _ 訪問我們的文檔網站以獲取完整的文檔,入門,示例,常見問題解答和其他信息。
有關編程API,請參見Scaladoc。
BSD 3條©Salesforce.com,Inc。