In the lecture we saw this hash table implementation:
class HashSet: def __init__(self, num_buckets): # each array element is the head of a linked list of Nodes self.a = num_buckets * [None] def contains(self, x): i = hash(x) % len(self.a) # hash bucket index p = self.a[i] while p != None: if p.val == x: return True p = p.next return False def add(self, x): if not self.contains(x): i = hash(x) % len(self.a) self.a[i] = Node(x, self.a[i]) # prepend to hash chain
Add a method remove(self, x) that will delete a value x from a hash table if it is present.
Write a class HashDictionary that implements a dictionary using a hash table with a fixed number of buckets. The number of buckets should be a constructor argument. The class should support these methods:
Modify the class HashSet from the lecture so that the number of buckets expands dynamically, as we discussed in the lecture. The constructor should take an initial number of buckets plus a load factor limit. Whenever the load factor exceeds the limit, the add() method should expand the table, doubling the number of buckets.
In a hash table, we could store the contents of each hash bucket using a dynamic array (i.e. a Python list) rather than a linked list. Write a class HashArraySet that is like the class HashSet from the lecture, but uses dynamic arrays in this fashion. Now compare the performance of HashSet and HashArraySet: with each class, create a hash table with 1000 buckets, insert 100,000 random values, then look up each of the random values in turn to check that it is in the table.
Write a program that reads an integer N and simulates inserting (100 · N) values into a hash table with N buckets using separate chaining, assuming that hash values will be distributed randomly. The program should print the lengths of the longest and shortest chains. (You do not need to use an actual hash table implementation in solving this exercise.)
Suppose that we insert N values into a hash table with N buckets. What fraction of the buckets do we expect will be empty? Assume that N is a large number.
We'd like to implement a data structure RandomQueue with the following interface:
q.add(x) – add an element to the queue
q.remove() - removes a random element from the queue
q.is_empty() - test if a RandomQueue is empty
All operations should be fast, i.e. not run in linear time. How can we implement this structure?
A binary min-heap is initially empty. We call its insert() operation N times to add N distinct integers to the heap. How long will this take in the worst case? In the best case? What sequences of values will result in the worst-case or best-case behavior?
Consider a sorted array. Must it be a valid min-heap?
Suppose that we run heapsort on an array that is already sorted. Will any elements move?
Write a function that takes an array and returns True if the array represents a valid binary heap, otherwise False.
Consider a binary min-heap that holds 12 values, with no duplicates.
How many different array positions are possible for the second smallest value in the heap?
How many different array positions are possible for the third smallest value in the heap?
How many different array positions are possible for the fourth smallest value in the heap?
How many different array positions are possible for the largest value in the heap?
Suppose we'd like to write a program that reads any number of integers from standard input, one per line. When the input ends, the program should print out the 1000 largest numbers that were seen, in sorted order. The input might contain millions or billions of integers, even more than will fit in memory. How can we do this efficiently?