Introduction to Algorithms
Lecture 4: Notes

Some of the topics we discussed today are covered in these sections of Problem Solving with Algorithms:

Here are some additional notes.

sieve of Eratosthenes

We've seen in previous lectures that we can determine whether an integer n is prime using trial division, in which we attempt to divide n by successive integers. Because we must only check integers up to sqrt(n), this primality test runs in time O(sqrt(n)).

Sometimes we may wish to generate all prime numbers up to some limit N. If we use trial division on each candidate, then we can find all these primes in time O(N sqrt(N)). But there is a faster way, using a classic algorithm called the Sieve of Eratosthenes.

It's not hard to carry out the Sieve of Eratosthenes using pencil and paper. It works as follows. First we write down all the integers from 2 to N in succession. We then mark the first integer on the left (2) as prime. Then we cross out all the multiples of 2. Now we mark the next unmarked integer on the left (3) as prime and cross out all the multiples of 3. We can now mark the next unmarked integer (5) as prime and cross out its multiples, and so on. Just as with trial division, we may stop when we reach an integer that is as large as sqrt(N).

Here is the result of using the Sieve of Eratosthenes to generate all primes between 2 and 30:

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

We can easily implement the Sieve of Eratosthenes in Python:

# Generate all prime numbers up to (but not including) n.

def sieve(n):
    isPrime = n * [True]
    isPrime[0] = isPrime[1] = False   # 0 and 1 are not prime

    i = 2
    while i * i <= n:
        if isPrime[i]:
            for j in range(2 * i, n, i):
                isPrime[j] = False
        i += 1

    return isPrime

How long does the Sieve of Eratosthenes take to run?

The inner loop, in which we set isPrime[j] = False, runs N/2 times when we cross out all the multiples of 2, then N/3 times when we cross out multiples of 3, and so on. So its total number of iterations will be

N(1/2 + 1/3 + 1/5 + ... + 1/p)

where p <= sqrt(N).

The series of reciprocals of primes

1/2 + 1/3 + 1/5 + 1/7 + 1/11 + 1/13 + ...

is called the prime harmonic series. How can we approximate its sum through a given element 1/p?

Euler was the first to show that the prime harmonic series diverges: its sum grows to infinity, through extremely slowly. Its partial sum through 1/p is actually close to ln (ln n). This means that the running time of the Sieve of Eratosthenes is

N(1/2 + 1/3 + 1/5 + ... + 1/p) = N * O(log log (sqrt N)) = N * O(log log N) = O(N log log N)

This is very close to O(N). (In fact more advanced algorithms can generate all primes through N in time O(N).)

binary search

We may wish to search for a value (sometimes called the “key”) in a sorted array. We can use an algorithm called binary search to find the key efficiently.

Binary search is related to the number-guessing game that we saw earlier. In that game, one player thinks of a number N, say between 1 and 1,000,000. If we are trying to guess it, we can first ask "Is N greater than, less than, or equal to 500,000?" If we learn that N is greater, then we now know that N is between 500,001 and 1,000,000. In our next guess we can ask to compare N to 750,000. By always guessing at the midpoint of the unknown interval, we can divide the interval in half at each step and find N in a small number of guesses.

Similarly, suppose that we are searching for the key in a sorted array of 1,000,000 elements from a[0] through a[999,999]. We can first compare the key to a[500,000]. If it is not equal, then we know which half of the array contains the key. We can repeat this process until the key is found.

To implement this in Python, we will use two integer variables lo and hi that keep track of the current unknown range. Initially lo = 0 and hi = length(a) - 1. At all times, we know that all array elements with indices < lo are less than the key, and all elements with indices > hi are greater than the key. That means that the key, if it exists in the array, must be in the range a[lo] … a[hi]. As the binary search progresses, lo and hi move toward each other. If eventually hi < lo, then the unknown range is empty, meaning that the key does not exist in the array.

Here is a binary search in Python. (Recall that a Python list is actually an array.)

# Search for value k in the list a.

lo = 0
hi = len(a)- 1

while lo <= hi:
  mid = (lo + hi) // 2
  if a[mid] == k:
    print('found', k, 'at index', mid)
    break
  elif a[mid] < k:
    lo = mid + 1
  else:  # a[mid] > k
    hi = mid - 1
else:
  print('not found')

Suppose that the input array has 1,000,000,000 elements. How many times might this loop iterate?

After the first iteration, the unknown interval (hi – lo) has approximately 500,000,000 elements. After two iterations, it has 250,000,000 elements, and so on. After k iterations it has 1,000,000,000 / 2k elements. We reach the end of the loop when 1,000,000,000 / 2k = 1, so k = log2(1,000,000,000)30. This means that the loop will iterate only about 30 times, which will take only a few microseconds at most.

binary search for a boundary

Suppose that we know that a given array consists of a sequence of values with some property P followed by some sequence of values that do not have that property. Then we can use a binary search to find the boundary point dividing the values with property P from those without it.

For example, suppose that we know that an array contains a sequence of non-decreasing integers. Given a value k, we want to find the index of the first value in the array that is greater than or equal to k. Because the array is non-decreasing, it must contain a sequence of values that are less than k, followed by a sequence of values that are at least k. We want to find the boundary point between these sequences.

# Given an array a of non-decreasing values, find the first one that is >= k.

lo = 0
hi = len(a) - 1

while lo <= hi:
  mid = (lo + hi) // 2
  if a[mid] < k:
    lo = mid + 1
  else:   # a[mid] >= k
    hi = mid - 1

print('The first value that is at least', k, 'is', a[lo])

At every moment as the function runs:

When the while loop finishes, lo and hi have crossed, so lo = hi + 1. A this point there are no unknown elements, and the boundary we are seeking lies between hi and lo. Specifically:

And so lo = hi + 1 is the index of the first value greater than or equal to k.

As mentioned above, we can use this form of binary search with any property P, not just the property of being less than k. As another example, if we have an array that consists of a sequence of even integers followed by a sequence of odd integers, we can find the first odd integer using this algorithm.

Tutorial

Here is an exercise we solved in the tutorial.

1. Largest Prime Gap

A prime gap is the difference between two consecutive prime numbers. For example, 7 and 11 are consecutive primes, and the gap between them has size 4.

Write a program that reads an integer N ≥ 3 and prints the largest prime gap among primes that are less than N. The program should print the pair of consective primes along with the gap size.

n = int(input('Enter n: '))

# Use the sieve of Eratosthenes to generate all primes that are less than n.

# assume all are prime at the beginning
isPrime = n * [ True ]

i = 2
while i * i <= n:    # while i <= sqrt(n)
    if isPrime[i]:
        # Cross off all the multiples of i
        for j in range(2 * i, n, i):  # 2i, 3i, 4i, ..., 
            isPrime[j] = False   # not prime
    i += 1

# Look for the largest prime gap in the range.

low, high = 2, 2    # the largest gap so far
last = 2            # the last prime we have seen so far

for i in range(3, n):
    if isPrime[i]:  # we found a new prime
        if i - last > high - low:   # we found a bigger gap
            low, high = last, i
        last = i

print('The largest gap is between primes ' + str(low) + ' and ' + str(high) +
      ', and has size ' + str(high - low))