Hash Function
Performance of Hash Tables depends on how good a hash functions is
In chaining implementation
- insert/delete could be anywhere from $\cfrac{m}{n}$ to $m$ for $m$ objects
- $\cfrac{m}{n}$ - equally distributed
- $m$ - all in the same bucket
That means that performance depends on the choice of hash function
Good Hash Function
So a good hash function should
- spread data out evenly
- be easy to store
- be fast to evaluate
Bad Hash Function
Example of bad function
- given: memory locations for objects
- $h(x) = x \mod 1000$
- all odd buckets will be empty!
Pathological data sets
- even a super-clever hash function does not guarantee even distribution
- for every hash function there exists a pathological data set
Collisions
A collision is
- distinct $x, y \in U$
- such that $h(x) = h(y)$
Well-known issue: Birthday paradox
See also
Sources