Boosting and Bagging using the eBay auction data (file eBayAuctions Ch 13.jmp Download eBayAuctions Ch 13.jmp) with variable Competitive as the target.
Create a classification tree (Analyze>Predictive Modeling>Partitition ). Cast the variable "Competitive?" to Y, Response, the variable "Validation" to Validation field, and all other variables to X, Factor.
In Partition platform dialog window use the "Go" button to create the model. Looking at the test set, what is the overall accuracy (red triangle option Show Fit Details)? What is the lift at portion = 0.20 (red triangle option Lift Curve)? Save the prediction formula to the data table. (Click Red Triangle>Save Columns>Save Prediction Formula)
The overall accuracy over the test set is 1-Misclassification =
[ Select ]
Review the lift curve over the test set. The lift at portion = 0.2 for Competitive = 1 is approximately
[ Select ]
, and the lift for Competitive = 0 is approximately
[ Select ]
.
Run the same tree (use red triangle option Redo>Relaunch Analysis), but first select the Boosted Tree as the method (from the drop-down menu) and set the Random seed to 123 (use Random Seed field in the next dialog window). Don`t change the other default settings. What is the overall accuracy? What is the lift at portion = 0.20? Save the prediction formula to the data table. (Click Red Triangle>Save Columns>Save Prediction Formula)
The overall accuracy of the Boosted Tree method on the test set is
[ Select ]
Review the lift curve on the test set (Red Triangle Option Lift Curve). The lift at portion = 0.2 for Competitive = 1 is approximately
[ Select ]
, and the lift for Competitive = 0 is approximately
[ Select ]
.
Now, try the same tree (use red triangle option Redo>Relaunch Analysis) with the Bootstrap Forest method selected, set the Random seed to 123 (in the next dialog window), and accept the default settings. For the test set, what is the overall accuracy? What is the lift at portion = 0.20? Again, save the prediction formula to the data table. (Click Red Triangle>Save Columns>Save Prediction Formula)
The overall accuracy of the Bootstrap Forest Method on the test set is 1-Misclassification =
[ Select ]
.
The lift at portion = 0.2 for Competitive = 1 is approximately
[ Select ]
, and the lift for Competitive = 0 is approximately
[ Select ]
.
Compare the three models using the Model Comparison platform under Analyze > Predictive Modeling. In the Model Comparison Platform`s launch window, cast the variable "Validation" to the Group field and click "OK". Compare the misclassification rates of the three models over the validation set. Which model has the best accuracy?
The Model
[ Select ]
has the best accuracy over the validation set
Now, choose Model Averaging (red triangle option) in the Model Comparison for Validation. This will create another prediction column in your data set. Now relaunch the Model Comparison platform again (Redo>Relaunch Analysis) with the "Validation" variable in the Group field. Compare the misclassification rates of all models. What is the misclassification rate of Model Averaged on the test set?
The misclassification rate of Model Averaged on the validation set is
[ Select ]