# Logistic Regression¶

Used for classification. We want $$\theta \leq h_\theta (x) \leq 1$$.

$h_\theta(x) = g(\theta^T x) = g(z) = \frac{1}{1 + e^z} \Rightarrow h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}}$

$$\frac{1}{1 + e^{-\theta^T x}}$$ is a sigmoid/logistic function

< Logistic function plot >

The idea of logistic regression is

• Suppose predict $$y = 1$$ if $$h_\theta (x) \geq 0.5$$
• Suppose predict $$y = 0$$ if $$h_\theta (x) < 0.5$$

## Linearity of Logistic Regression¶

From Stackoverflow.

The logistic regression model is of the form,

$\mathrm{logit}(p_i) = \mathrm{ln}\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_{1,i} + \beta_2 x_{2,i} + \cdots + \beta_p x_{p,i}.$

It is called a generalized linear model not because the estimated probability is linear, but because the logit of the estimated probability response is a linear function of the parameters.

## Cost Function¶

$\begin{split}\text{cost}(h_\theta(x), y) = \left\{ \begin{array}{lr} - \log(h_\theta(x)) & \text{if y=1 }\\ - \log(1 - h_\theta(x)) & \text{if y=0 } \end{array} \right.\end{split}$

We use a separate cost function for logistic regression which differs from linear regression because otherwise it will be too wavy; cause too many local optima.

$J(\theta) = \frac{1}{m} \sum^{m}_{i=1} \text{cost}(h_\theta(x^i), y^i)$

### Simplified Logistic Regression Cost Function¶

\begin{align}\begin{aligned}\text{cost}(h_\theta(x^i), y^i) = -y \log(h_\theta(x)) - (1-y) \log(1-h_\theta(x))\\J(\theta) = - \frac{1}{m} [ \sum^{m}_{i=1} y^i \log(h_\theta (x^i)) + (1-y^i)\log(1-h_\theta(x^i))]\end{aligned}\end{align}

### Logistic Regression Cost Function Gradient Descent¶

To minimize $$J(\theta)$$, repeat until convergence:

$\theta_j := \theta_j - \alpha \sum^{m}_{i=1} (h_\theta (x^i) - y^i)x_j^i$

This algorithm is identical to linear regression.

### Normal Equation of Gradient Descent¶

$\theta := \theta - \frac{\alpha}{m}X^T (g(X\theta) - \vec{y})$

## Regularization¶

$J(\theta) = - \frac{1}{m} \sum^{m}_{i=1} [y^i \log(h_\theta (x^i)) + (1-y^i)\log(1-h_\theta(x^i))] + \frac{\lambda}{2m} \sum^{n}_{j=1}\theta_j^2$