# Variational Inference¶

Variational means you vary parameters in each step.

The idea is to approximate the posterior distribution of unknowns math:p(Z|X) with a tractable distribution math:q(Z). For example math:q(Z) may be assumed to have a simple form, e.g., Gaussian, or to factorize in a certain way. For the GMM, it would be sufficient to assume

$q ( \mathbf { z } ,\pi ,\Lambda ,\mu ) = q ( z ) q ( \pi ,\Lambda ,\mu )$

## Basis¶

When math:q(Z)is an approximation for math:p(Z|X), it is always true that

$\log p ( x ) = \mathcal { L } ( q ) + K L ( q \| p ), \text{ where}$
$\mathcal { L } ( q ) = \int q ( \mathbf { z } ) \log \left\{ \frac { p ( x ,z ) } { q ( z ) } \right\} d \mathbf { z } \quad \text{(lower bound for \log p(x))}$
$K L ( q \| \rho ) = - \int q ( z ) \log \left\{ \frac { p ( \mathbf { z } | \mathbf { x } ) } { q ( \mathbf { z } ) } \right\} d\mathbf { z } \quad \text{(KL-divergence btw q and p)}$
• Goal: to maximize $$\mathcal { L } ( q )$$ or, equivalently, to minimize the $$K L ( q \| p )$$.
• Note: $$\mathcal { L } ( q )$$ is also called the ‘ELBO’(evidence lower bound)

:math: :math: :math: :math: :math:













## Mixture models¶

• Probabilisitically-grounded way of doing soft clustering contrast to k-mean.
• Each cluster: a generative model(Gaussian or multinomial)
• Parameters(e.g. mean/covariance are unknown)

EM(Expectation Maximization) algorithm automatically discover all parameters for the K “sources”.

## Gaussian mixture models¶

Gaussian mixture models are a probabilistic model for representing normally distributed subpopulations within an overall population. Mixture models in general don’t require knowing which subpopulation a data point belongs to, allowing the model to learn the subpopulations automatically. Since subpopulation assignment is not known, this constitutes a form of unsupervised learning. [1]

< (Left) Fit with one Gaussian distribution (Right) Fit with Gaussian mixture model with two components [1] >

One-dimensional Model

\begin{split}\begin{align} p(x) &= \sum_{i=1}^K\phi_i \mathcal{N}(x \;|\; \mu_i, \sigma_i)\\ \mathcal{N}(x \;|\; \mu_i, \sigma_i) &= \frac{1}{\sigma_i\sqrt{2\pi}} \exp\left(-\frac{(x-\mu_i)^2}{2\sigma_i^2}\right)\\ \sum_{i=1}^K\phi_i &= 1 \end{align}\end{split}

Multi-dimensional Model

\begin{split}\begin{align} p(\vec{x}) &= \sum_{i=1}^K\phi_i \mathcal{N}(\vec{x} \;|\; \vec{\mu}_i, \Sigma_i)\\ \mathcal{N}(\vec{x} \;|\; \vec{\mu}_i, \Sigma_i) &= \frac{1}{\sqrt{(2\pi)^K|\Sigma_i|}} \exp\left(-\frac{1}{2}(\vec{x}-\vec{\mu}_i)^\mathrm{T}{\Sigma_i}^{-1}(\vec{x}-\vec{\mu}_i)\right)\\ \sum_{i=1}^K\phi_i &= 1 \end{align}\end{split}

## EM algorithm¶

EM solves a chicken and egg problem, [2]

• Need $$\left( \mu _ { a ^ { \prime } } \sigma _ { a } ^ { 2} \right)$$ and $$\left( \mu _ { b ^ { \prime } } \sigma _ { b } ^ { 2} \right)$$ to guess source of points
• Need to know the sources to estimate $$\left( \mu _ { a ^ { \prime } } \sigma _ { a } ^ { 2} \right)$$ and $$\left( \mu _ { b ^ { \prime } } \sigma _ { b } ^ { 2} \right)$$

### Idea¶

1. Start with two randomly placed Gaussians $$\left( \mu _ { a ^ { \prime } } \sigma _ { a } ^ { 2} \right)$$ and $$\left( \mu _ { b ^ { \prime } } \sigma _ { b } ^ { 2} \right)$$
2. (E-step) for each point $$P \left( b | x _ { i } \right)$$ does it look like it came from b?
3. (M-step) adjust $$\left( \mu _ { a ^ { \prime } } \sigma _ { a } ^ { 2} \right)$$ and $$\left( \mu _ { b ^ { \prime } } \sigma _ { b } ^ { 2} \right)$$ to fit points assigned to them
4. Iterate until convergence

Reference