Modify our binary search tree implementation from last week's lecture so that it implements a dictionary with these operations:
d.add(key, val)
d.contains_key(key)
d.lookup(key)
The lookup() method should return None if the key is not found.
In the dynamic array class that we saw in the lecture, we started with 10 elements, and doubled the number of array elements on every resize operation. We then showed that we can grow the array to size N in time O(N), because
10 + 20 + 40 + 80 + ... + N = O(N)
Prove that, more generally, for any starting size S and growth factor R, we have
S + RS + R2S + R3S + ... + N = O(N)
where N = RkS, the array size after k resize operations.
Suppose that in our dynamic array implementation instead of doubling the array size at each step we only multiply it by 1.5, which will save memory. By the previous exercise, the time per add() operation will still be O(1) on average. However, it will be more expensive by a constant factor, which is the price we'll pay for the decreased memory usage.
Specifically, if we double the array size at each step, then the total cost of all the resize operations to grow the array to size N will be proportional to
f(N) = 1 + 2 + 4 + 8 + ... + N
If we multiply it by 1.5 at each step, the total cost will be proportional to
g(N) = 1 + (1.5) + (1.5)2 + (1.5)3 + ... = N
So the increase in cost will be lim N → ∞ [g(N) / f(N)], which is a constant. Compute the value of this constant.
Consider the hash function we saw in the lecture:
# Generate a hash code in the range 0 .. 2^32 - 1. def my_hash(s): h = 0 for c in s: d = ord(c) h = (B * h + d) % (2 ** 32) return h
a) If B = 0, what hash function will result? Will this be a good hash function?
b) If B = 1, what hash function will result? Will this be a good hash function?
c) Suppose that we want to place all unique words of War and Peace into 10,000 hash buckets. If we use the hash function above, some values of B will be better than others, because they will result in a more even distribution of hash values. Design an experiment to compare the performance of these choices: B = 2; B = 256; B = 257; B = 65,537; B = 1,000,003.