One vs All Classifier
Suppose we have a classifier for sorting out input data into 3 categories:
- class 1 (△)
- class 2 (◻)
- class 3 (×)
We may turn this problem into 3 binary classification problems (i.e. where we predict only y∈{0,1}) to be able to use classifiers such as Logistic Regression.
- We take values of one class and turn them into positive examples, and the rest of classes - into negatives
- Step 1
- triangles are positive, and the rest are negative - and we run a classifier on them.
-

- and we calculate h(1)θ(x) for it
- Step 2
- next we do same with squares: make them positive, and the rest - negative
-

- and we calculate h(2)θ(x)
- Step 3
- finally, we make ×s as positive and the rest as negative and calculate h(3)θ(x)
-

So we have fit 3 classifiers:
- h(i)θ(x)=P(y=i|x;θ),i=1,2,3
- Now, having calculated the vector hθ(x)=[h(1)θ(x),h(2)θ(x),h(3)θ(x)] we just pick up the maximal value
- i.e. we choose maxih(i)θ(x)
Implementation
The implementation is straightforward
- Matlab/Octave implementation can be found here
Sources