ML Wiki

Motivation: Searching

Suppose we have a relation $R(A, B, C, D)$ , each tuple - 32 bytes

Suppose we want to find a tuple with $C = 10$

In worst case it's $10^6$ I/Os

Can we do better?

An index is any secondary memory data structure that

Index can be clustered or unclustered

When index is clustered it means the records themselves are stored in index, not pointers
I.e. a clustered index ensures that all data is stored in some order
Usually there is only one clustered index per relation (otherwise the data will be duplicated)

If an index (say, B-Tree) is not clustered, then instead of following each pointer other techniques can be used, such as Bitmap Heap Scan

Indexing can also be applied to unstructured data such as text

Inverted Index builds an index from words to documents where these words are contained
Locality Sensitive Hashing gives an approximate answer to KNN queries

Database Systems Architecture (ULB)
Database Systems: The Complete Book (2nd edition) by H. Garcia-Molina, J. D. Ullman, and J. Widom