Multinomial Distribution

It's an extension of Binomial Distribution

Maximum Likelihood Estimator

  • consider log likelihood function: $\log P(D \mid \theta) = \sum_{w \in V} c(w, D) \, \log P(w \mid \theta)$
  • we want to maximize it s.t. $P(w \mid \theta)$ is a Probability Distribution i.e. $\sum_{w \in V} P(w \mid \theta) = 1$
  • use Lagrange Multipliers to convert this constrained optimization problem into an unconstrained one
  • so let $L(\theta, \lambda) = \log P(D \mid \theta) + \lambda \left(1 - \sum P(w \mid \theta) \right) = \sum_{w \in V} c(w, D) \, \log P(w \mid \theta) + \lambda \left(1 - \sum P(w \mid \theta) \right)$
  • by solving it, we get $P(w \mid \hat \theta) = \cfrac{c(w, D)}{|D|}$


  • Zhai, ChengXiang. "Statistical language models for information retrieval." 2008.