by Chee Yee Lim

Posted on 2021-04-30

Collection of notes on model type - focusing on Bayesian framework (statistical point of view).

- Key concepts
- The main aim of Bayesian method is not to find the single "best" value of the model parameters, but rather to determine the posterior distribution for the model parameters.
- Both parameters \( \beta \) and output \( y \) are estimated as distributions.

- Advantages
- Can be explicit about uncertainty.
- Get a distribution per parameter. Can draw random samples from these parameter distributions to get uncertainty bounds around output.

- Can be explicit about uncertainty.
- Disadvantages
- Sampling process (i.e. model fitting) can be computationally expensive.

- A full Bayesian approach is difficult and often computational intractable, even on a simple problem.

- A Bayesian model can be approximate methods like Monte Carlo (random) sampling methods, with the most common being Markov Chain Monte Carlo (MCMC).
- Besides MCMC, there are also Variational Bayes and ABC.
- MCMC and Variational Bayes are expensive to compute, with the cheapest being maximum a posteriori (MAP).
- To get a point estimate from a Bayesian model, MAP can be used. MAP will return the mode of the posterior distribution. It is similar to maximum likelihood estimate (MLE), but with an additional prior distribution. MAP (and MLE) is guaranteed to find one of the models (local maxima).

- Key concepts
- In the Bayesian viewpoint, we formulate linear regression using probability distributions rather than point estimates.

- A Bayesian linear regression can be formulated as:
- \( P( \beta | y, X ) = \frac{ P( y | \beta, X ) \times P( \beta | X ) }{ P( y | X ) } \)
- Where \( P( \beta | y, X ) \) is the posterior probability distribution of the model parameters given the inputs and outputs, and \( I \) is the identity matrix.
- This is derived from Bayes Theorem.
- \( Posterior = \frac{ Likelihood \times Prior }{ Normalization } \)

- A Bayesian logistic regression can be formulated as a GLM with a binomial link function.
- The prior can be based on prior knowledge or a Gaussian prior, which is a vague prior that will let data speak for themselves.
- The likelihood is the product of n Bernoulli trials, \( \prod_{i=1}^n p_{i}^y (1 - p_i)^{1-y_i} \),
- Where \( p_i = \frac{1}{1 + e^{-z_i}}, z_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 \) and \( y_i = \{ 0,1 \} \).

- Key concepts
- Each Bayesian network is a directed acyclic graph.
- Each node represents a state and each edge has an associated probability.

- Advantages
- Explainable, in the form of a flowchart.
- Useful when there are dependency structures among the independent variables.
- Only assume conditional independence between independent variables.
- Useful for decision making under uncertainty (due to Bayesian framework).

- Example use cases