In the lecture we saw this hash table implementation:
class HashSet: def __init__(self, num_buckets): # each array element is the head of a linked list of Nodes self.a = num_buckets * [None] def contains(self, x): i = hash(x) % len(self.a) # hash bucket index p = self.a[i] while p != None: if p.val == x: return True p = p.next return False def add(self, x): if not self.contains(x): i = hash(x) % len(self.a) self.a[i] = Node(x, self.a[i]) # prepend to hash chain
Add a method remove(self, x) that will delete a value x from a hash table if it is present.
Write a class HashDictionary that implements a dictionary using a hash table with a fixed number of buckets. The number of buckets should be a constructor argument. The class should support these methods:
Modify the class HashSet from the lecture so that the number of buckets expands dynamically, as we discussed in the lecture. The constructor should take an initial number of buckets plus a load factor limit. Whenever the load factor exceeds the limit, the add() method should expand the table, doubling the number of buckets.
In a hash table, we could store the contents of each hash bucket using a dynamic array (i.e. a Python list) rather than a linked list. Write a class HashArraySet that is like the class HashSet from the lecture, but uses dynamic arrays in this fashion. Now compare the performance of HashSet and HashArraySet: with each class, create a hash table with 1000 buckets, insert 100,000 random values, then look up each of the random values in turn to check that it is in the table.
Write a program that reads an integer N and simulates inserting (100 · N) values into a hash table with N buckets using separate chaining, assuming that hash values will be distributed randomly. The program should print the lengths of the longest and shortest chains. (You do not need to use an actual hash table implementation in solving this exercise.)
Suppose that we insert N values into a hash table with N buckets. What fraction of the buckets do we expect will be empty? Assume that N is a large number.
We'd like to implement a data structure RandomQueue with the following interface:
q.add(x) – add an element to the queue
q.remove() - removes a random element from the queue
q.is_empty() - test if a RandomQueue is empty
All operations should be fast, i.e. not run in linear time. How can we implement this structure?