ML Wiki

Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Hash Function

algorithms

Hash Function

Performance of Hash Tables depends on how good a hash functions is

In chaining implementation

insert/delete could be anywhere from $\cfrac{m}{n}$ to $m$ for $m$ objects
$\cfrac{m}{n}$ - equally distributed
$m$ - all in the same bucket

That means that performance depends on the choice of hash function

Good Hash Function

So a good hash function should

spread data out evenly
be easy to store
be fast to evaluate

Bad Hash Function

Example of bad function

given: memory locations for objects
$h(x) = x \mod 1000$
all odd buckets will be empty Pathological data sets
even a super-clever hash function does not guarantee even distribution
for every hash function there exists a pathological data set

Collisions

A ‘‘collision’’ is

distinct $x, y \in U$
such that $h(x) = h(y)$

Well-known issue: Birthday paradox

See also

Hash Tables

Sources

Algorithms Design and Analysis Part 1 (coursera)

✏️ Edit on GitHub