One vs All Classifier
Suppose we have a classifier for sorting out input data into 3 categories:
 class 1 ($\triangle$)
 class 2 ($\square$)
 class 3 ($\times$)
We may turn this problem into 3 binary classification problems (i.e. where we predict only $y \in \{0, 1\}$) to be able to use classifiers such as Logistic Regression.
 We take values of one class and turn them into positive examples, and the rest of classes  into negatives
 Step 1
 triangles are positive, and the rest are negative  and we run a classifier on them.

 and we calculate $h_{\theta}^{(1)}(x)$ for it
 Step 2
 next we do same with squares: make them positive, and the rest  negative

 and we calculate $h_{\theta}^{(2)}(x)$
 Step 3
 finally, we make $\times$s as positive and the rest as negative and calculate $h_{\theta}^{(3)}(x)$

So we have fit 3 classifiers:
 $h_{\theta}^{(i)}(x) = P(y = i  x; \theta), i = 1, 2, 3$
 Now, having calculated the vector $h_{\theta}(x) = [h_{\theta}^{(1)}(x), h_{\theta}^{(2)}(x), h_{\theta}^{(3)}(x)]$ we just pick up the maximal value
 i.e. we choose $\max_{i} h_{\theta}^{(i)}(x)$
Implementation
The implementation is straightforward
 Matlab/Octave implementation can be found here
Sources