ML Wiki

Motivation: Searching

Suppose we have a relation $R(A, B, C, D)$, each tuple - 32 bytes

$128 \times 10^6$ tuples is $R$
Block size is $B$ = 4kb
I.e. can store 128 tuples per block, $10^6$ tuples in total

Suppose we want to find a tuple with $C = 10$

Searching in I/O Model of Computation

for each block $X \in R$
- load $X$
- check if there's a tuple $T \in X$ with $C = 10$
- yes - output $T$
- no - continue
- release $X$ from memory

In worst case it's $10^6$ I/Os

suppose $10^{-3}$ seconds per I/O operation
16.6 minutes in total

Can we do better?

Indexing

An index is any secondary memory data structure that

takes a search key as input
efficiently returns the collection of matching records

One-Dimensional Indexes

Conventional Indexes

Dense Index
Sparse Index
Secondary Index as a combination of both

Tree-Based Indexes

B-Tree

Hash-Based Indexes

Multi-Dimensional Indexes

Conventional

Multiple-Key Index

Tree-Based Indexes

Hash-Based Indexes

Clustered Index

Index can be clustered or unclustered

When index is clustered it means the records themselves are stored in index, not pointers
I.e. a clustered index ensures that all data is stored in some order
Usually there is only one clustered index per relation (otherwise the data will be duplicated)

If an index (say, B-Tree) is not clustered, then instead of following each pointer other techniques can be used, such as Bitmap Heap Scan

Information Retrieval Indexing

Indexing can also be applied to unstructured data such as text

Inverted Index builds an index from words to documents where these words are contained
Locality Sensitive Hashing gives an approximate answer to KNN queries

Sources

Database Systems Architecture (ULB)
Database Systems: The Complete Book (2nd edition) by H. Garcia-Molina, J. D. Ullman, and J. Widom

ML Wiki

Indexing (databases)
Revision as of 12:37, 4 July 2015 by Alexey (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Motivation: Searching

Indexing

One-Dimensional Indexes

Conventional Indexes

Tree-Based Indexes

Hash-Based Indexes

Multi-Dimensional Indexes

Conventional

Tree-Based Indexes

Hash-Based Indexes

Clustered Index

Information Retrieval Indexing

Sources

Indexing (databases)Revision as of 12:37, 4 July 2015 by Alexey (Talk | contribs) (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Motivation: Searching

Indexing

One-Dimensional Indexes

Conventional Indexes

Tree-Based Indexes

Hash-Based Indexes

Multi-Dimensional Indexes

Conventional

Tree-Based Indexes

Hash-Based Indexes

Clustered Index

Information Retrieval Indexing

Sources

Indexing (databases)
Revision as of 12:37, 4 July 2015 by Alexey (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)