Model Ensemble

by Chee Yee Lim


Posted on 2021-03-27



Collection of notes on model ensemble - high level overview on ensemble concepts and techniques


Model Ensemble

Overview

  • Ensemble techniques are called meta-algorithms, which refer to approaches to combine several ML techniques into one predictive model.
    • 3 key techniques are bagging, boosting and stacking.
  • Bagging vs boosting
    • Use boosting when a single model gives a very low performance (i.e. underfit).
    • Use bagging when a single model is overfitted easily.

Bagging

  • Bagging involves training several models/estimators on random subsets of the training data in parallel, and then aggregate their individual predictions (in parallel) to form a final prediction.
    • This method is used as a way to reduce the variance of a base model/estimator by introducing randomisation (i.e. bootstrapping) into its construction process and then making an ensemble out of it.
    • Bagging methods work best with strong and complex models (i.e. strong learners - e.g. fully developed decision trees), in constrast with boosting methods which usually work best with weak models (i.e. weak leaners - e.g. shallow decision trees).

Boosting

  • Boosting involves training a sequence of weak learners (i.e. models that are only slightly better than random guessing) on repeatedly modified versions of the data. The predictions from all of them are then combined through a weighted majority vote to produce the final prediction.
    • In AdaBoost, each data point has an associated weight, and wrongly classified points have increased weights at each learner.
    • The weighted error rate of data will be calculated per decision tree. The higher this error rate, the lower weight will be given to the tree during final prediction.
    • \( Weight\ of\ the\ tree = Learning\ rate \times \log \frac{(1-e)}{e} \), where \( e \) is the error rate per tree.
    • In Gradient Boosting, the residual of a decision tree is used as input to the next decision tree. Data points are not modified by weights.

Stacking / Stacked Generalisation

  • Stacking can be viewed as a generalisation of bagging and boosting.
  • In stacking, a set of base learners is used to generate meta features predictions, which act as inputs to a meta learner to generate a final prediction.
    • Stacking
  • It can be difficult to define the meta features in practice.