Introduction to Algorithms, 2021-2
Week 7: Notes

Some of the topics we discussed today are covered in these sections of Problem Solving with Algorithms:

6.11 The Merge Sort
6.12 The Quick Sort

Here are some more notes:

recurrences

We have already studied how to analyze the running time of iterative functions, i.e. functions that use loops. We'd now like to do the same for recursive functions.

To analyze any recursive function's running time, we perform two steps. First, we write a recurrence, which is an equation that expresses the function's running time recursively, i.e. in terms of itself. Second, we solve the recurrence mathematically.

For example, consider this simple recursive function:

# Compute the sum a[lo:hi]
def fun_a(a, lo, hi):
    if hi < lo:
        return 0
    
    return a[lo] + fun_a(a, lo + 1, hi)

The function computes the sum of a range of values in a list, using recursion to perform its task.

Let's write down a recurrence that expresses this function's running time. Let T(N) be the time to run this function, where N is hi - lo, i.e. the number of elements in the range. The function performs a constant amount of work outside the recursive call (since we generally assume that integer addition runs in constant time). When it calls itself recursively, the number of elements to sum decreases by one. And so we have the recurrence

T(N) = T(N - 1) + O(1)

Now we'd like to solve the recurrence. We can do so by expanding it algebraically. O(1) is some constant C. We have

T(N) = C + T(N - 1)
= C + C + T(N – 2)
= …
= C + C + ... + C + T(0)
= NC + T(0) = O(N)

The function runs in linear time. That's not a surprise, since an iterative loop to sum the numbers in the list would also run in O(N).

As another example, consider this recursive implementation of binary search:

# Search for the value x in a[lo:hi].  If found, return its index;
# otherwise return None.
def search(a, lo, hi, x):
    if hi < lo:
        return None

    mid = (lo + hi) // 2
    if a[mid] == x:
        return mid
    elif a[mid] < x:
        return search(a, mid + 1, hi, x)
    else:
        return search(a, lo, mid - 1, x)

Let's write an recurrence expressing its running time as a function of N = hi - lo, which is the size of the range in which to search. The work that the function does outside the recursive calls takes constant time, i.e. O(1). The function calls itself recursively at most once. When it calls itself, the size of the range that it passes is (nearly equal to) N / 2. So we have

T(N) = O(1) + T(N / 2)

To solve this, let's again observe that O(1) is some constant C. We can write

T(N) = C + T(N / 2)
= C + C + T(N / 4)
= …
= C + C + ... + C + T(1)
= C log₂(N) + T(1)
= O(log N)

The function runs in O(log N) in the worst case. Again, this is not a surprise, since that is the same as the running time of an iterative binary search.

These recurrences were relatively easy to solve, since they arose from recursive functions that call themselves at most once. When we study recursive functions that call themselves twice or more, we will see recurrences that are not quite as easy.

merging sorted arrays

Suppose that we have two arrays, each of which contains a sorted sequence of integers. For example:

a = [3, 5, 8, 10, 12]
b = [6, 7, 11, 15, 18]

And suppose that we'd like to merge the numbers in these arrays into a single array c containing all of the numbers in sorted order.

Actually this is not difficult. We can use integer variables i and j to point to members of a and b, respectively. Initially i = j = 0. At each step of the merge, we compare a[i] and b[j]. If a[i] < b[j], we copy a[i] into the destination array, and increment i. Otherwise we copy b[j] and increment j. The entire process will run in linear time, i.e. in O(N) where N = len(a) + len(b).

Let's write a function to accomplish this task:

# Merge sorted arrays a and b into c, given that
# len(a) + len(b) = len(c).
def merge(a, b, c):
    i = j = 0     # index into a, b
    
    for k in range(len(c)):
        if j == len(b):      # no more elements available from b
            c[k] = a[i]      # so we take a[i]
            i += 1
        elif i == len(a):    # no more elements available from a
            c[k] = b[j]      # so we take b[j]
            j += 1
        elif a[i] < b[j]:
            c[k] = a[i]      # take a[i]
            i += 1
        else:
            c[k] = b[j]      # take b[j]
            j += 1

We can use this function to merge the arrays a and b mentioned above:

a = [3, 5, 8, 10, 12]
b = [6, 7, 11, 15, 18]
c = (len(a) + len(b)) * [0]

merge(a, b, c)

merge sort

We now have a function that merges two sorted arrays. We can use this as the basis for implementing a sorting algorithm called merge sort.

Merge sort has a simple recursive structure. To sort an array of n elements, it divides the array in two and recursively merge sorts each half. It then merges the two sorted subarrays into a single sorted array.

For example, consider merge sort’s operation on this array:

Merge sort splits the array into two halves:

It then sorts each half, recursively.

Finally, it merges these two sorted arrays back into a single sorted array:

Here's an animation of merge sort in action on the above array.

Here's an implemention of merge sort, using our merge() function from the previous section:

def merge_sort(a):
    if len(a) < 2:
        return
        
    mid = len(a) // 2
    
    left = a[:mid]      # copy of left half of array
    right = a[mid:]     # copy of right half
    
    merge_sort(left)
    merge_sort(right)
    
    merge(left, right, a)

What is the running time of merge sort? In the code above, the helper function merge runs in time O(N), where N is the length of the array c. The array slice operations a[:mid] and a[mid:] also take O(N). So if T(N) is the time to run merge_sort on an array with N elements, then we have

T(N) = 2 T(N / 2) + O(N)

We have not seen this recurrence before. Let's expand it algebraically. Let O(N) = CN for some constant C. We have

T(N) = CN + 2 T(N / 2)
= CN + 2 ( C (N / 2) + 2 T(N / 4))
= CN + CN + 4 T(N / 4)
= CN + CN + 4 ( C (N / 4) + 2 T(N / 8))
= CN + CN + CN + 8 T(N / 8)
= …
= CN + CN + … + CN + N T(1)
= (log₂ N) CN + N T(1)

T(N) = O(N log N)

Above, the term (log₂ N) appears because we must perform (log₂ N) divisions by 2 to reach the value 1.

For large N, O(N log N) is much faster than O(N²), so mergesort will be far faster than insertion sort or bubble sort. For example, suppose that we want to sort 1,000,000,000 numbers. And suppose (somewhat optimistically) that we can perform 1,000,000,000 operations per second. An insertion sort might take roughly N² = 1,000,000,000 * 1,000,000,000 operations, which will take 1,000,000,000 seconds, or about 32 years. A merge sort might take roughly N log N ≈ 30,000,000,000 operations, which will take 30 seconds. This is a dramatic difference!

We've seen that merge sort is quite efficient. However, one potential weakness of merge sort is that it will not run in place: it requires extra memory beyond that of the array that we wish to sort. That's because the merge step won't run in place: it requires us to merge two arrays into an array that is held in a different region of memory. So any implementation of merge sort must make a second copy of data at some point. (Ours does so in the lines 'left = a[:mid]' and 'right = a[mid:]', since an array slice in Python creates a copy of a range of elements.)

Quicksort

Quicksort is another recursive sorting algorithm. It is efficient (typically at least twice as fast as merge sort) and is very commonly used.

Unlike merge sort, quicksort sorts values in an array in place. To sort a slice a[lo:hi] of values in an array a, quicksort first chooses a pivot value p and arranges the slice elements into two partitions a[lo:k] and a[k+1:end] for some integer k, such that

Every value in the first partition a[lo:k] is less than or equal to p.
a[k] = p.
Every element in the second partition a[k + 1:hi] is greater than or equal to p. (Actually in our own implementation these values will always be strictly greater.)

As a consequence, every element in the first partition (a[lo:k]) is less than or equal to a[k], which is less than or equal to every element in the second partition (a[k:hi]). After partitioning, quicksort recursively calls itself on each of the two partitions, and then the sort is complete.

The high-level structure of quicksort is similar to merge sort: both algorithms divide the array in two and call themselves recursively on both halves. With quicksort, however, there is no more work to do after these recursive calls. Another difference is that merge sort always divides the array into two equal-sized pieces, but in quicksort the two partitions are generally not the same size, and sometimes one may be much larger than the other.

Two different partitioning algorithms are in common use. In this course we will use Lomuto partitioning. (Another common partitioning algorithm for Quicksort is Hoare partitioning, which is discussed in the Problem Solving with Algorithms text linked above.)

Lomuto partitioning works as follows. To partition the array elements a[lo:hi], we first choose the last element a[hi – 1] to use as the pivot. Let p be the pivot value a[hi - 1].

We define integer variables i and k representing array indexes. Initially i = k = lo, i.e. i and k are positioned at the left end of the region that we want to partition. We will advance i across all indices from lo to hi, building the partitions as we proceed. Our goal is that at all times,

all values in the range a[lo : k] (i.e. the left partition) are less than or equal to the pivot
all values in the range a[k : i] (i.e. the right partition) are greater than the pivot

In other words, k is the boundary point between the partitions that we are building. Notice that these conditions are true at the beginning, since initially i = k = lo and so both partitions are empty.

On each step, we examine the next element a[i].

If a[i] > p, then the value a[i] can join the right partition. We leave a[i] in place, and then increment i. We have expanded the right partition by one element.
Otherwise a[i] <= p, so it belongs in the left partition. In this case, we swap a[i] and a[k], then increment k and i. This moves a[i] into the left partition. It also moves a[k] forward from the beginning of the right partition to the end of it. (If k == i, then the right partition is empty and the swap does nothing; we have simply expanded the left partition by one element).

For example, consider quicksort’s operation on this array:

2 8 7 1 3 5 6 4

The pivot value is 4, i.e. the value of the last element. Intially k = i = 0, and both partitions are empty:

2 8 7 1 3 5 6 4
k
i

On the first iteration, we see that a[i] = 2 which is less than 4, so we swap a[k] and a[i] (which does nothing), then increment both indices:

2 8 7 1 3 5 6 4
  k
  i

Now a[i] = 8 which is greater than the pivot, so we only increment i:

2 8 7 1 3 5 6 4
  k
    i

Now a[i] = 7 which is also greater than the pivot, so we increment i again:

2 8 7 1 3 5 6 4
  k
      i

Now a[i] = 1. This element belongs in the left partition, so we swap a[k] and a[i], then increment both indices:

2 1 7 8 3 5 6 4
    k
        i

Now a[i] = 3. This element also belongs in the left partition, so we swap a[k] and a[i], then increment both indices:

2 1 3 8 7 5 6 4
      k
          i

The next two values are greater thatn the pivot, so we increment i as we consider each of them:

2 1 3 8 7 5 6 4
      k
            i

2 1 3 8 7 5 6 4
      k
              i

Finally, the last element is the pivot value 4 which is less than or equal to itself, so we swap it with a[k], then increment:

2 1 3 4 7 5 6 8
        k
                i

We have now processed all array elements. At this point, we know that the last element of the left partition must be the pivot value itself, since we just saw the pivot at the end of the array and necessarily swapped it. So we decrement k:

2 1 3 4 7 5 6 8
      k
                i

Now the partition is complete. a[k] is the pivot element; all elements in a[:k] are less than or equal to the pivot; all elements in a[k + 1:] are greater than the pivot.

Here is a Python implementation of this partitioning algorithm. In this function, i increments automatically at the end of each loop iteration:

# Choose a pivot p and partition a[lo:hi] into three regions:
#
#   a[lo : k]     a[k]       a[k + 1 ... hi]
#     <= p         p               > p
#
# Return the value k.
#
def partition(a, lo, hi):
    pivot = a[hi - 1]
    k = lo
    for i in range(lo, hi):
        if a[i] <= pivot:     # a[i] belongs in the left partition
            a[k], a[i] = a[i], a[k]
            k += 1
    k -= 1
    return k

With partition in place, we can easily write our top-level quicksort function:

# Sort the elements a[lo:hi].
def quicksort(a, lo, hi):
    if hi - lo <= 1:
        return
    
    k = partition(a, lo, hi)
    quicksort(a, lo, k)   # a[lo], a[lo + 1] ... a[k - 1]
    quicksort(a, k + 1, hi)

performance of Quicksort

To analyze the performance of quicksort, we first observe that partition(a, lo,hi) runs in time O(N), where N = hi - lo.

In the best case, quicksort divides the input array into two equal-sized partitions at each step. Then we have

T(N) = 2 T(N / 2) + O(N)

This is the same recurrence we saw when we analyzed merge sort previously. Its solution is

T(N) = O(N log N)

In the worst case, at each step one partition has a single element and the other partition has all remaining elements. Then

T(N) = T(N – 1) + O(N)

This yields

T(N) = O(N²)

Unfortunately, if the input array is already sorted (which is certainly possible in many real-world situations), we will have this worst-case performance, since our implementation chooses the last array element as a pivot.

For this reason, it is better to choose a random pivot. In our implementation, we can do so at the beginning of the partition() function, and swap it into the last position in the array before our partitioning loop. We can write

def partition(a, lo, hi):
    r = random.randrange(lo, hi)
    pivot = a[r]
    a[r], a[hi - 1] = a[hi - 1], a[r]
    ...