General goal of an Information Retrieval systems: rank relevant items much higher than non-relevant

- to do it, the items must be scored

*Retrieval function* is a scoring function that's used to rank documents

- retrieval function is based on a
**retrieval model** - Retrieval Model defines the notion of relevance and makes it possible to rank the documents

There are 5 categories of IR models

- they define the retrieval function in different ways
- they also different in how they define/measure relevance

- main assumption: relevance of a query $Q$ to the document $D$ is correlated with $\text{similarity}(Q, D)$
- i.e. the more similar a document $D$ to the query $Q$, the more relevant $Q$ to $D$ is
- potentially can use any similarity function

Vector Space Models are most well-known

- use Bag-of-Word to build a vector space
- both documents and the query are represented as vectors in this space
- each term is assigned some weight that reflects the importance of this term
- and then we use Cosine Similarity or Inner Product to rank queries

It's a framework that defines:

- Term VSM: how documents and queries are represented (by terms they have)
- Similarity measure defined on this vector space
- also it has Document VSM: how terms are represented (terms are represented by documents where they are used) - but it's not very relevant for IR

- Boolean Model: only exact match
- satisfies all the conditions of the query
- hard to rank
- Extended Boolean Model: more flexible

- relevance = "what is the probability that document $D$ is relevant to the query $Q$?"
- Binary Independence Retrieval - classical probabilistic IR model, assumes term independence
- it's sort of "Naive Bayes Classifier" for IR
- BM25 Ranking Function is comparable with TF-IDF weighting performance

- from Bayesian Decision Theory
- general risk miminization framework for IR

Query Likelihood scoring method

- use Statistical Language Models for NLP
- Ponte, Jay M., and W. Bruce Croft. "A language modeling approach to information retrieval." 1998. [1]

- http://comminfo.rutgers.edu/~aspoerri/InfoCrystal/Ch_2.html
- http://wwwhome.cs.utwente.nl/~hiemstra/papers/IRModelsTutorial-draft.pdf

- Information Retrieval (UFRT)
- Zhai, ChengXiang. "Statistical language models for information retrieval." 2008.