One vs All Classifier

Suppose we have a classifier for sorting out input data into 3 categories:

  • class 1 ()
  • class 2 ()
  • class 3 (×)

multiclass-one-vs-all-01.png

We may turn this problem into 3 binary classification problems (i.e. where we predict only y{0,1}) to be able to use classifiers such as Logistic Regression.

  • We take values of one class and turn them into positive examples, and the rest of classes - into negatives
  • Step 1
    • triangles are positive, and the rest are negative - and we run a classifier on them.
    • multiclass-one-vs-all-02.png
    • and we calculate h(1)θ(x) for it
  • Step 2
    • next we do same with squares: make them positive, and the rest - negative
    • multiclass-one-vs-all-03.png
    • and we calculate h(2)θ(x)
  • Step 3
    • finally, we make ×s as positive and the rest as negative and calculate h(3)θ(x)
    • multiclass-one-vs-all-04.png


So we have fit 3 classifiers:

  • h(i)θ(x)=P(y=i|x;θ),i=1,2,3
  • Now, having calculated the vector hθ(x)=[h(1)θ(x),h(2)θ(x),h(3)θ(x)] we just pick up the maximal value
  • i.e. we choose maxih(i)θ(x)


Implementation

The implementation is straightforward

  • Matlab/Octave implementation can be found here


Sources