Partitioned Hash Function Index

database-indexes database-systems-architecture

Suppose we use classical approach for hashing a tuple $(a, b)$

To tackle this problem, we define a hash function $h$ in a different way

$h(v_1, …, v_n) = h_1(v_1)

…

h_n(v_n)$

- where $

$ means concatenation

- so it’s a list of functions $h_i$

$h$ produces $k$-bit output
- each function $h_i$ produces $k_i$ bits and $\sum k_i = k$

Example

assume we have 1024 buckets, to address each one we need at least 10 bits
i.e. $k = 10$ and $2^k = 1024$
suppose we decide (say, based on density, etc) that
- $f(x)$ reserves 2 bits,
- $g(y)$ gets 5 bits and
- $h(z)$ - last 3 bits

By partition a hash function this way we can query only for some attributes

Queries

Good support:

No Support for:

range queries
nearest-neighbor queries (physical distance is not reflected in the way we build $h$)

And

Database Systems Architecture (ULB)
Database Systems: The Complete Book (2nd edition) by H. Garcia-Molina, J. D. Ullman, and J. Widom

✏️ Edit on GitHub