Week 6: Notes

Quicksort

Quicksort is another sorting algorithm. It is efficient (typically significantly faster than mergesort) and is commonly used.

Quicksort sorts values in an array in place - unlike mergesort, which needs to use extra memory to perform merges. Given a range a[lo:hi] of values to sort in an array a, quicksort first arranges the elements in the range into two non-empty partitions a[lo:k] and a[k:hi] for some integer k. It does this in such a way that every element in the first partition (a[lo:k]) is less than or equal to every element in the second partition (a[k:hi]). After partitioning, quicksort recursively calls itself on each of the two partitions.

The high-level structure of quicksort is similar to mergesort: both algorithms divide the array in two and call themselves recursively on both halves. With quicksort, however, there is no more work to do after these recursive calls. Sorting both partitions completes the work of sorting the array that contains them. Another difference is that merge sort always divides the array into two equal-sized pieces, but in quicksort the two partitions are generally not the same size, and sometimes one may be much larger than the other.

Two different partitioning algorithms are in common use. In this course we will use Hoare partitioning, which works as follows. To partition a range of an array, we first choose some element in the range to use as the pivot. The pivot has some value p. Once the partition is complete, all elements in the first partition will have values less than or equal to p, and elements in the second partition will have values greater than or equal to p.

To partition the elements, we first define integer variables i and j representing array indexes. Initially i is positioned at the left end of the array and j is positioned at the right. We move i rightward, looking for a value that is greater than or equal to v (i.e. it can go into the second partition). And we move j leftward, looking for a value that is less than or equal to v. Once we have found these values, we swap a[i] and a[j]. After the swap, a[i] and a[j] now hold acceptable values for the left and right partitions, respectively. Now we continue the process, moving i to the right and j to the left. Eventually i and j meet. The point where they meet is the division point between the partitions, i.e. is the index k mentioned above.

Note that the pivot element itself may move to a different position during the partitioning process. There is nothing wrong with that. The pivot really just serves as an arbitrary value to use for separating the partitions.

For example, consider a partition operation on this array:

%3

The array has n = 8 elements. Suppose that we choose 3 as our pivot. We begin by setting i = 0 and j = 7. a[0] = 6, which is greater than the pivot value 3. We move j leftward, looking for a value that is less than or equal to 3. We soon find a[6] = 2 ≤ 3. So we swap a[0] and a[6]:

%3

Now a[1] = 5 ≥ 3. j moves leftward until j = 3, since a[3] = 3 ≤ 3. So we swap a[1] and a[3]:

%3

Now i moves rightward until it hits a value that is greater than or equal to 3; the first such value is a[3] = 5. Similarly, j moves leftward until it hits a value that is less than or equal to 3; the first such value is a[2] = 1. So i = 3 and j = 2, and j < i. i and j have crossed, so the partitioning is complete. The first partition is a[0:3] and the second partition is a[3:8]. Every element in the first partition is less than every element in the second partition.

Notice that in this example the pivot element itself moved during the partitioning process. As mentioned above, this is common.

Here is an implementation of Hoare partitioning in Python:

from random import randrange

# We are given an unsorted range a[lo:hi] to be partitioned, with
# hi - lo >= 2 (i.e. at least 2 elements in the range).
#
# Choose a pivot value p, and rearrange the elements in the range into
# two partitions p[lo:k] (with values <= p) and p[k:hi] (with values >= p).
# Both partitions must be non-empty.
#
# Return k, the index of the first element in the second partition.
def partition(a, lo, hi):
    # Choose a random element as the pivot, avoiding the first element
    # (see the discussion below).
    p = a[randrange(lo + 1, hi)]

    i = lo
    j = hi - 1

    while True:
        # Look for two elements to swap.
        while a[i] < p:
            i += 1
        while a[j] > p:
            j -= 1

        if j <= i:             # nothing to swap
            return i 
        
        a[i], a[j] = a[j], a[i]
        i += 1
        j -= 1

At all times as the code above runs,

The code that decides when to exit the loop and computes the return value is slightly tricky. Let's distinguish two cases that are possible when we hit the test 'if j <= i':

In summary, in either of these cases i is a correct value to return.

With partition in place, we can easily write our top-level quicksort function:

# Sort a[lo:hi].
def qsort(a, lo, hi):
    if hi - lo <= 1:
        return

    k = partition(a, lo, hi)
    qsort(a, lo, k)
    qsort(a, k, hi)

def quicksort(a):
    qsort(a, 0, len(a)) 

performance of Quicksort

To analyze the performance of quicksort, first consider the running time of the helper function partition(a, lo, hi) as a function of N = hi - lo. The first inner while loop has body "i += 1". Certainly this statement can run at most N times, because i will always remain within the range lo .. hi. So the total running time of this inner while loop is O(N). Similarly, the total running time of the second inner while loop with body "j -= 1" is also O(N). The outer 'while True' loop iterates at most N times, and its body (excepting the inner loops, which we have already considered) runs in O(1) on each iteration. And so the total running time of partition() is O(N).

Now let's consider the running time of the recursive function qsort(a, lo, hi) as a function of N = hi - lo. In the best case, quicksort divides the input array into two equal-sized partitions at each step. Then we have

T(N) = 2 T(N / 2) + O(N)

This is the same recurrence we saw when we analyzed merge sort previously! As we know, its solution is

T(N) = O(N log N)

In the worst case, at each step one partition has a single element and the other partition has all remaining elements. Then

T(N) = T(N – 1) + O(N)

This yields

T(N) = O(N2)

Now suppose that instead of choosing random pivot elements, we always chose the second element as a pivot (since we must avoid the first):

p = a[1]

And suppose that the input array is already sorted (which is certainly possible in many real-world situations). Then in each partition operation, the left partition would contain only a single element, and the right partition would contain all the remaining elements. This would cause this worst-case behavior, and the sort would run in O(N2). Similarly, if we always chose the last element as a pivot, we would also get worst-case behavior if the input array is already sorted.

For this reason, it is better to choose a random pivot as we did in our implementation above.

With pivot elements chosen at random, the worst-case running time of Quicksort is still O(N2), however the expected running time is O(N log N), and the chance of quadratic behavior is astronomically small. (You may see a proof of this in ADS 1 next semester.)

Here's one more small point. In the partition function, when we increment i and decrement j we stop when we see a value that is equal to the pivot. We will then swap that value into the other partition. Is this necessary? Could we simply skip over values that are equal to the pivot?

Suppose that we modified the code to look like this:

        while a[i] <= p:
            i += 1
        while a[j] >= p:
            j -= 1

And suppose that we are sorting a range of values a[lo:hi] that are all the same, e.g. [4, 4, 4, 4]. The pivot will be 4. Now we will increment i until it runs past the end of the array on the right side, which is an error. For this reason, we must use strict equality (<, >) in these while conditions.

abstract data types

An abstract data type specifies a set of operations that an object may provide. In the next few lectures of this course, we will see various abstract data types including stacks, queues, sets, and dictionaries.

We will see that for most abstract data types there are different possible implementations using various data structures. We'll also see that the big-O running time of operations will often vary between implementations.

stacks

A stack is an abstract data type that provides two operations called 'push' and 'pop'. push(x) adds a value x to a stack, and pop() removes the value that was most recently pushed and returns it. This is like a stack of sheets of paper on a desk, where sheets can be added or removed at the top.

In other words, a stack is a last in first out (LIFO) data structure: the last element that was added is the first to be removed.

Often a stack will provide an additional operation is_empty() that returns true if it contains no elements.

Before we implement a stack, let's look at how one can be used:

# initially 's' is an empty stack
s.push(4)
s.push(8)
s.push(1)
  
while not s.is_empty() do
    print(pop(s))

This code will write

1
8
4

implementing a stack with an array

An easy way to implement a stack in Python is using a list, i.e. an array:

class ArrayStack:
    def __init__(self):
        self.a = []

    def is_empty(self):
        return len(self.a) == 0
  
    def push(self, x):
        self.a.append(x)
  
    def pop(self):
        assert not self.is_empty(), 'stack is empty'
        return self.a.pop()

The implementation is straightforward. The list contains all stack elements, with the top of the stack (i.e. the most recently pushed element) at the end of the list.

With this implementation, 'push' will run in O(1) on average, since that is the running time of append(). 'pop' will always run in O(1).

linked lists

Let's now study a broadly useful data structure that we might alternatively use to implement a stack, namely a linked list. A linked list looks like this:

An element of a linked list is called a node. A node contains one or more values, plus a pointer to the next node in the list. The first node of a linked list is called its head. The last node of a linked list is its tail. The tail always points to None (in Python, or its equivalent such as nil in other languages).

By the way, we will sometimes illustrate a linked list more compactly:

2 → 4 → 7 → None

The two pictures above denote the same structure; the first is simply more detailed.

Note that a Python list is not a linked list! A Python list is an array. :)

Here is a node type for a linked list in Python:

class Node:
    def __init__(self, val, next):
        self.val = val
        self.next = next

We can build the 3-element linked list pictured above as follows:

>>> r = Node(7, None)
>>> q = Node(4, r)
>>> p = Node(2, q)

Now p refers to the head of the list, and q and r refer to successive elements:

Through p, we can get to the values in the list:

>>> p.val
2
>>> p.next.val
4
>>> p.next.next.val
7

We can traverse a linked list using a loop that moves to the next list node on each iteration. Here's a function that takes a pointer to the head of a linked list, and returns its length:

def list_len(head):
    n = head
    count = 0
    while n != None:
        n = n.next      # move to the next node
        count += 1
    return count

By modifying the function only slightly, we can write a function that computes the sum of all values in a linked list:

def list_sum(head):
    n = head
    total = 0
    while n != None:
        total += n.val
        n = n.next      # move to the next node
    return total

Let's now write a function that takes an integer n and constructs a linked list containing the values from 1 to n. One easy way to do that is to build the list in reverse order, just like when we built a list above with the values 2, 4, and 7. On each step, we will prepend a node to the existing list.

def list_1_to_n():
    head = None
    for i in range(n, 0, -1):       # n, n - 1, ... 1
        p = Node(i, head)
        head = p
    return head

As this function runs, the local variable 'head' points to the beginning of the list that we've built so far. On each loop iteration, the function allocates a new node by calling the Node constructor function. As it does so, it makes the new node point to the existing list, by passing 'head' as the value for the 'next' attribute. After that, it modifies 'head' to point to the newly allocated node.

We can write many more functions that operate on linked lists, performing operations such as inserting nodes, deleting nodes, and so on. To get practice with this, we will solve various linked list exercises in our tutorials.

implementing a stack using a linked list

We can implement a stack using a linked list. Our class LinkedStack will have an attribute 'head' that points to the head of the list:

The stack operations are fairly straightforward:

class LinkedStack:
    def __init__(self):
        self.head = None
    
    def is_empty(self):
        return self.head == None
  
    def push(self, x):
        n = Node(x, self.head)

        self.head = n
  
    def pop(self):
        assert self.head != None, 'stack is empty'
        x = self.head.val
        self.head = self.head.next
        return x

Notice that the push() method allocates a node n that points to the previous head, then records n as the new list head. In other words, it prepends a new node to the list. This is similar to the activity in the list_1_to_n() function above.

If we implement a stack using an array (i.e. a Python list), the push operation will take O(1) on average but O(N) in the worst case, where N is the current stack size. Our linked list-based implementation performs differently: push always runs in O(1) (assuming that object allocations run in constant time).

queues

Queues are another important abstract data type. A queue provides two operations called 'enqueue' and 'dequeue'. enqueue(x) adds an element to the tail of a queue, and dequeue() removes the element at the head and returns it. A queue is something like people waiting in a line: you must join at the back of the line, and the person at the front of the line is served next.

In other words, queues are a first in first out (FIFO) data structure: the first value added to a queue will be the first one to be removed.

A queue can be used like this:

# initially 'q' is an empty queue

q.enqueue(4)
q.enqueue(77)
q.enqueue(12)

print(q.dequeue())  # writes 4

print(q.dequeue())  # writes 77

implementing a queue with an array

A naive implementation of a queue in Python will store elements in an array, i.e. a Python list. It might look like this:

class ArrayQueue:
    def __init__(self):
        self.a = []
    
    def is_empty(self):
        return len(self.a) == 0
    
    def enqueue(self, x):
        self.a.append(x)

    def dequeue(self):
        return self.a.pop(0)

With this implementation, is_empty() will run in O(1), and enqueue() will run in O(1) on average. However, dequeue() will run in O(N), because all array elements must shift leftward by one position when the first element is deleted. So this is a poor choice of implementation if N might be large.

implementing a queue using a linked list

We can implement a queue more efficiently using a linked list. To do so, we will keep pointers to both the head (first node) and tail (last node) in the list, which will allow us to enqueue or dequeue elements in O(1):

Here is an implementation:

class LinkedQueue:
    def __init__(self):
        self.head = self.tail = None
  
    def isEmpty(self):
        return self.head == None
  
    def enqueue(self, x):
        n = Node(x, None)
        if self.tail == None:
            self.head = self.tail = n
        else:
            self.tail.next = n
            self.tail = n
      
    def dequeue(self):
        assert self.head != None, 'queue is empty'
        x = self.head.val
        self.head = self.head.next
        if self.head == None:
            self.tail = None
        return x