Week 9: Exercises

1. Hash Removal

In the lecture we saw this hash table implementation:

class HashSet:
    def __init__(self, num_buckets):
        # each array element is the head of a linked list of Nodes
        self.a = num_buckets * [None]

    def contains(self, x):
        i = hash(x) % len(self.a)     # hash bucket index
        p = self.a[i]
        while p != None:
            if p.val == x:
                return True
            p = p.next
        return False

    def add(self, x):
        if not self.contains(x):
            i = hash(x) % len(self.a)
            self.a[i] = Node(x, self.a[i])   # prepend to hash chain

Add a method remove(self, x) that will delete a value x from a hash table if it is present.

2. Hash Dictionary

Write a class HashDictionary that implements a dictionary using a hash table with a fixed number of buckets. The number of buckets should be a constructor argument. The class should support these methods:

d.set(key, value)
Add a new (key, value) pair, or update an existing key if present.
d.remove_key(key)
Remove a key and its associated value.
d.lookup(key)
Look up a key and return its associated value, or None if absent.

3. Resizable Hashing

Modify the class HashSet from the lecture so that the number of buckets expands dynamically, as we discussed in the lecture. The constructor should take an initial number of buckets plus a load factor limit. Whenever the load factor exceeds the limit, the add() method should expand the table, doubling the number of buckets.

4. Bucket Arrays

In a hash table, we could store the contents of each hash bucket using a dynamic array (i.e. a Python list) rather than a linked list. Write a class HashArraySet that is like the class HashSet from the lecture, but uses dynamic arrays in this fashion. Now compare the performance of HashSet and HashArraySet: with each class, create a hash table with 1000 buckets, insert 100,000 random values, then look up each of the random values in turn to check that it is in the table.

5. Hash Chain Lengths

Write a program that reads an integer N and simulates inserting (100 · N) values into a hash table with N buckets using separate chaining, assuming that hash values will be distributed randomly. The program should print the lengths of the longest and shortest chains. (You do not need to use an actual hash table implementation in solving this exercise.)

6. Empty Buckets

Suppose that we insert N values into a hash table with N buckets. What fraction of the buckets do we expect will be empty? Assume that N is a large number.

7. Random Queue

We'd like to implement a data structure RandomQueue with the following interface:

All operations should be fast, i.e. not run in linear time. How can we implement this structure?

8. Heap Insertion

A binary min-heap is initially empty. We call its insert() operation N times to add N distinct integers to the heap. How long will this take in the worst case? In the best case? What sequences of values will result in the worst-case or best-case behavior?

9. Sorted Heap

  1. Consider a sorted array. Must it be a valid min-heap?

  2. Suppose that we run heapsort on an array that is already sorted. Will any elements move?

10. Heap Check

Write a function that takes an array and returns True if the array represents a valid binary heap, otherwise False.

11. Heap Positions

Consider a binary min-heap that holds 12 values, with no duplicates.

  1. How many different array positions are possible for the second smallest value in the heap?

  2. How many different array positions are possible for the third smallest value in the heap?

  3. How many different array positions are possible for the fourth smallest value in the heap?

  4. How many different array positions are possible for the largest value in the heap?

12. Thousand Largest

Suppose we'd like to write a program that reads any number of integers from standard input, one per line. When the input ends, the program should print out the 1000 largest numbers that were seen, in sorted order. The input might contain millions or billions of integers, even more than will fit in memory. How can we do this efficiently?