Multivariate Linear Regression

machine-learning regression statistics

Linear Regression

Linear Regression - main article about univariative linear regression

Multiple Features

suppose we have several features
and we want to use them all to predict $Y$

For example, we want to predict a house’s price

$y$ - price (dependent)
$x_1$ - # of bedrooms
$x_2$ - # of floors
$x_3$ - age
$x_4$ - size

let’s use the following notation

$n$ - number of features
$m$ - number of examples
$x^{(i)}$ - a vector all features of the $i$th training set, $x^{(i)} \in \mathbb{R}$
$x_j^{(i)}$ - $j$th element of $i$th training example

e.g.: $x^{(2)} = (x_1^{(2)}, x_2^{(2)}, x_3^{(2)}, x_4^{(2)})$ - vector of all features from the second row

Recall that for one variable we have

$h_{\theta}(x) = \theta_0 + \theta_1 x$
now we have

$h_{\theta}(x) = \theta_0 + \theta_1 x_1 + … + \theta_n x_n$

let $x_0 = 1$ (i.e. all $x_0^{(i)} = 1$) - so-called zeroth feature - always 1 (our slope)

So now we can view $x^{(i)}$ as $n+1$ vector: $x^{(i)} \in \mathbb{R}^{n + 1}$, $x = \left[ \begin{matrix} x_0 \ \vdots \ x_n \end{matrix} \right]$ and $\theta = \left[ \begin{matrix} \theta_0 \ \vdots \ \theta_n \end{matrix} \right] \in \mathbb{R}^{n+1}$

And $h_{\theta}(x) = \theta_0 x_0 + \theta_1 x_1 + … + \theta_n x_n = \theta^{T} x$ (which is $[\theta_0 … \theta_n] \cdot \left[ \begin{matrix} x_0 \ \vdots \ x_n \end{matrix} \right]$ )

This is called ‘‘multivariate linear regression’’

Polynomial Regression

Suppose we want to fit not just features, but their combinations
For example, we have two features: height and width, and we want to use them both to fit one parameter $\theta$
- So we write:
- $h(x) = \theta_0 + \theta_1 x = \theta_0 + \theta_1 \cdot \text{height} \cdot \text{width}$
- ($x = \text{height} \cdot \text{width}$)

Next, suppose we have the following relationship between data

we may try to use
$\theta_0 + \theta_1 x + \theta_2 x^2 $
or even
$\theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3 $

So we have 3 features instead of one: $x$, $x^2$ and $x^3$

- Don’t forget to normalize them - it’s important because all these features have different scales

Computing Coefficients

Gradient Descent for Multivariate Linear Regression

Normal Equation

This is another way of computing coefficients for multivariate regression

Linear Regression Assumptions

https://en.wikipedia.org/wiki/Ordinary_least_squares#Assumptions
http://stats.stackexchange.com/questions/55113/where-do-the-assumptions-for-linear-regression-come-from
http://people.duke.edu/~rnau/testing.htm
http://www.statisticssolutions.com/assumptions-of-linear-regression/
http://pareonline.net/getvn.asp?n=2&v=8
http://stats.stackexchange.com/questions/16381/what-is-a-complete-list-of-the-usual-assumptions-for-linear-regression

Sources

Machine Learning (coursera)

✏️ Edit on GitHub