One vs All Classifier

Suppose we have a classifier for sorting out input data into 3 categories:

  • class 1 ($\triangle$)
  • class 2 ($\square$)
  • class 3 ($\times$)

multiclass-one-vs-all-01.png

We may turn this problem into 3 binary classification problems (i.e. where we predict only $y \in \{0, 1\}$) to be able to use classifiers such as Logistic Regression.

  • We take values of one class and turn them into positive examples, and the rest of classes - into negatives
  • Step 1
    • triangles are positive, and the rest are negative - and we run a classifier on them.
    • multiclass-one-vs-all-02.png
    • and we calculate $h_{\theta}^{(1)}(x)$ for it
  • Step 2
    • next we do same with squares: make them positive, and the rest - negative
    • multiclass-one-vs-all-03.png
    • and we calculate $h_{\theta}^{(2)}(x)$
  • Step 3
    • finally, we make $\times$s as positive and the rest as negative and calculate $h_{\theta}^{(3)}(x)$
    • multiclass-one-vs-all-04.png


So we have fit 3 classifiers:

  • $h_{\theta}^{(i)}(x) = P(y = i | x; \theta), i = 1, 2, 3$
  • Now, having calculated the vector $h_{\theta}(x) = [h_{\theta}^{(1)}(x), h_{\theta}^{(2)}(x), h_{\theta}^{(3)}(x)]$ we just pick up the maximal value
  • i.e. we choose $\max_{i} h_{\theta}^{(i)}(x)$


Implementation

The implementation is straightforward

  • Matlab/Octave implementation can be found here


Sources

Share your opinion