Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one. Other applications of ensemble learning include assigning a confidence to the decision made by the model, selecting optimal (or near optimal) features, data fusion, incremental learning, nonstationary learning and error-correcting. This article focuses on classification related applications of ensemble learning, however, all principle ideas described below can be easily generalized to function approximation or prediction type problems as well.
Use of ensemble learning algorithm to improve the accuracy of Decision tree model. Build a predictive model to understand key parameters affecting USA income.
Bagging is an ensemble technique mainly used to reduce the variance of our predictions by combining the result of multiple classifiers modelled on different sub-samples of the same data set
Random Forest is an ensembling method and one of the most popular and powerful algorithm in Machine Learning. The random forest is a model made up of many decision trees. Rather than just simply averaging the prediction of trees (which we could call a “forest”), this model uses two key concepts that gives it the name random:
- Random sampling of training data points when building trees
- Random subsets of features considered when splitting nodes
Step 1.
Repeat For ntrees 10,200 by increment of 10Step 2.
Repeat for mtrys 1:7 (no of features to try)Step 3.
Repeat for different seeds "num" from 1:20Step 4.
set Seed (num)Step 5.
compute mean(accuracies of different setseed)Step 6.
max(between all the 7 try’sStep 7.
Result