Lecture 6

Hash Table

We want:

We can have:

Direct Access array

Consider a setting in which a map with n items uses keys that are known to be integers in a range from 0 to N − 1

More general types of keys

What should we do if our keys are not integers in the range from 0 to N – 1?

Alt text

Hash functions and Hash Tables

Alt text

Hash functions

Usually specified as the composition of two functions

Alt text

Hash codes

Component Sum: Partition the bits of the key into components of fixed length

Polynomial Accumulation:

What is the intuition?

Alt text

Intutition on paper

Cyclic Shift

Alt text

def cyclic_hash(my_string): mask = (1 << 32) – 1 # Max 32-bit int h = 0 # the running sum of the hash value for character in my_string: h = (h << 5 & mask) | (h >> 27) # 5 bit shift h += ord(character) # the integer representation of the char return h

Shift collisions Example

Alt text

Compression Functions

Division

Multiply, Add and Divide (MAD)

Recall that

Collision Handling

Separate Chaining

Alt text

Map with Separate Chaining

Algorithm get(k): # get is find return A[h(k)].get(k) Algorithm put(k, v): # put is insert t = A[h(k)].put(k, v) if t = null then {k is a new key} n = n + 1 return t Algorithm remove(k): t = A[h(k)].remove(k) if t != null then {k was found} n = n - 1 return t

Linear probing

Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

Alt text

put(k, o)

Quadratic Probing

Double hashing

Secondary hash function d(k)d(k) handles collisions by placing an item in the first available cell of the series (h(k)+jd(k))modN(h(k) + jd(k)) \mod N for j=0,1,2,,N1j = 0, 1, 2, … , N - 1

Consider a hash table storing integer keys that handles collision with double hashing E.g.

Alt text

Hashing performance

Load factor a=n/Na = n/N affects performance of a hash table

Load Factors

Fun with Hashing

Bloom Filters

Alt text

Perfect hashing

Idea: Given our set of keys S, we do some “offline work” to find a hash function that gives no collisions

Cuckoo Hashing

Idea: Store two hash tables with a unique hash function for each