Intro to Algorithms
Week 2 – Notes

Algorithms

Some introductory notes about algorithms in general:

An algorithm is a precise sequence of steps to achieve some goal.
You probably already know some algorithms from daily life, i.e. algorithms for adding/subtracting/multiplying numbers which you learned in grade school.
Different algorithms that perform the same task may differ in efficiency, as we'll see in this course.
Algorithms are a major area of research; some computer scientists spend their entire careers working on them.
Surprisingly, some problems cannot be solved by any algorithm. You may see a proof of this fact in a more advanced course.

Decimal representation

By longstanding convention, basically every society on Earth writes numbers in base 10, otherwise known as decimal. When we write a number such as 1985, we actually mean a sum of powers of 10:

1985 = 1 · 10³ + 9 · 10² + 8 · 10 + 5

In the number above, 1 is the most significant digit, i.e. the one that represents the greatest power of 10. 5 is the least significant digit.

Note that computers do not normally store integers internally in base 10. In Python, if you write

>>> x = '22387'

then internally x will be stored as a series of characters '2', '2', '3', '8', '7' becaue it's a string. However, if you write

>>> x = 22387

then it will be stored in a more efficient binary format.

We will now discuss simple algorithms that can split an integer (e.g. 1985) into a series of digits (1, 9, 8, 5), and can combine a series of digits into an integer.

Extracting digits from an integer

To split into digits, we first notice that for any integer i,

i mod 10 is the last digit of i
i div 10 is the number formed by all digits of i except the last

For example:

1985 mod 10 = 5
1985 div 10 = 198.

So we can write a simple loop that extracts digits from an integer in succession. In Python:

n = int(input('Enter n: '))

while n > 0:
    d = n % 10    # n mod 10
    n = n // 10   # n div 10
    print('digit: ', d)

For example:

Enter n: 1985
digit:  5
digit:  8
digit:  9
digit:  1

Notice that this method extracts the digits in order from the least significant to the most significant.

Combining digits

To combine a series of digits into a number, we first notice that we can append a decimal digit d to an integer n using this formula:

n' = 10 · n + d

For example, if n = 198 and d = 5, then n' = 10 · 198 + 5 = 1985.

So we can write a simple loop that reads a string containing a series of digits, and combines them into an integer:

s = input('Enter s: ')    # s is a string

n = 0
for c in s:  # loop over characters of string
    d = int(c)   # convert a single to an int = one digit

    n = n * 10
    n = n + d

print(n)

The program behaves like this:

$ py build.py
Enter s: 3288
3288
$

It may look as if this program is doing nothing! However, here's what's happening:

The input is a string, i.e. a sequence of characters '3', '2', '8', '8'. We do not call the int() function to convert it to an integer.
We loop over the characters and convert each one to an integer: 3, 2, 8, 8. As we generate each integer digit d, we multiply n by 10, then add d to n. So at first n = 0, then n = 3, then 32, then 328, and finally 3288.
Finally, we call the print() function to print this number. Internally, print() converts the number to a series of decimal digits (actually using the same digit extraction algorithm that we saw above).

Since the program's output is the same as its input, this may seem useless. However, we'll see below that we can make good use of this code.

numbers in different bases

We use base 10 by convention, but of course the number 10 is arbitrary. We would now like to work with numbers written in non-decimal bases, i.e. bases other than 10. For example, consider base 5. In the base 5 system, every digit has a value from 0 to 4. Consider the number

2031₅

Here, the subscript 5 means that the number is written in base 5.

The decimal value of the base-5 number above is

2 · 5³ + 0 · 5² + 3 · 5 + 1 = 266₁₀

Base 2 (binary) is especially common in computer programming. Here are the first natural numbers in base 2:

0₂ = 0₁₀
1₂ = 1₁₀
10₂ = 2₁₀
11₂ = 3₁₀
100₂ = 4₁₀
101₂ = 5₁₀
110₂ = 6₁₀
111₂ = 7₁₀
1000₂ = 8₁₀

We also often use base 16 (hexadecimal), in which we have extra digits A = 10, B = 11, C = 12, D = 13, E = 14, F = 15. For example,

FF₁₆ = 255₁₀ because 15 · 16¹ + 15 · 16⁰ = 240 + 15 = 255

extracting digits in any base

We'd like to now write a program that can read a decimal integer such as 43, and print its equivalent in base 2, which is 101011:

101011₂ = 43₁₀

because

1 · 2⁵ + 0 · 2⁴ + 1 · 2³ + 0 · 2² + 1 · 2 + 1 = 32 + 8 + 2 + 1 = 43

So we now need to extract digits from an integer in base 2. Above, we wrote a program that can extract digits in base 10. Here it is again:

n = int(input('Enter n: '))

while n > 0:
    d = n % 10
    n = n // 10
    print('digit:', d)

Now we only need to make a trivial change, namely to change the constant 10 to 2 in our code! That will work, because, for example:

43 mod 2 = 1 (the last digit of 43, in base 2)
43 div 2 = 21 = 10101₂(all digits except the last, in base 2)
21 mod 1 = 1 (the second-to-last digit of 43, in base 2)
...and so on.

Here's our program with that trivial change:

n = int(input('Enter n: '))

while n > 0:
    d = n % 2
    n = n // 2
    print('digit: ', d)

Let's run the program:

Enter n: 43
digit:  1
digit:  1
digit:  0
digit:  1
digit:  0
digit:  1

The digits are generated in reverse order., i.e. from least to most significant.

We can modify the program to combine the digits into a string. Because the digits are generated in reverse order, we must prepend to the string at each step to end up with the digits in the correct order.

n = int(input('Enter n: '))

s = ''
while n > 0:
    d = n % 2
    n = n // 2
    s = str(d) + s    # prepend digit to string
print('in base 2:', s)

Let's run it:

Enter n: 43
in base 2: 101011

combining digits in any base

Similarly, above we saw a program that reads a series of decimal digits and joins them into an integer:

s = input('Enter s: ')    # s is a string

n = 0
for c in s:  # loop over characters of string
    d = int(c)   # convert a single to an int = one digit

    n = n * 10
    n = n + d

print(n)

We can easily modify this program to work another base. Once again, let's simply change 10 to 2:

s = input('Enter s: ')    # s is a string

n = 0
for c in s:  # loop over characters of string
    d = int(c)   # convert a single to an int = one digit

    n = n * 2
    n = n + d

print(n)

Now we run the program:

Enter s: 101011
43

The program has read the binary digits of 101011₂ and combined them into the number 43.

converting between bases

To convert from base B to base C, you can perform these two steps in succession:

Combine the base D digits into an integer.
Extract digits in base C from the integer.

primality testing

As we know from mathematics, a prime number is an integer greater than 1 whose only factors are 1 and itself. For example, 2, 7, 47 and 101 are all prime. A integer greater than 1 that is not prime is said to be composite.

We would now like to write a program that tests whether a given number is prime. To do this, we will use a simple algorithm called trial division, which means dividing by each possible factor in turn. (By the way, there also exist more efficient algorithms for primality testing, though they are more complex. You may encounter these in more advanced courses.)

Here is a naive implementation of trial division:

n = int(input('Enter n: '))

prime = True

for i in range(2, n):  # loop from 2 .. (n - 1)
    if n % i == 0:
        prime = False
        break

if prime:
    print('n is prime')
else
    print('not prime')

This works fine, but is inefficient because it may need to test all integers from 2 up to (n – 1). When n is large, this can take a long time.

Actually for a given n we need test only the values up to √n, i.e. the square root of n. To see this, consider the following fact:

Fact. If n is composite, then it has a prime factor that is less than or equal to √n.

Proof. If n is composite, then n = ab for some integers 1 < a, b < n. Suppose that a > √n and b > √n. Then ab > √n ⋅ √n = n, which is a contradiction since ab = n. So either a ≤ √n or b ≤ √n.

It follows that if we have tested all the values from 2 through √n and none of them divide n, then n cannot be composite, so it is prime.

Here's a more efficent version of our program that stops at √n:

n = int(input('Enter n: '))

prime = True

i = 2
while i * i <= n:     # i.e. while i <= sqrt(n)
    if n % i == 0: 
        prime = False
        break 
    i += 1

if prime:
    print('prime')
else:
    print('not prime')

prime factorization

The Fundamental Theorem of Arithmetic states that

Every positive integer has a unique prime factorization.

For example:

24 = 2 ⋅ 2 ⋅ 2 ⋅ 3 = 2³ ⋅ 3¹
30 = 2 ⋅ 3 ⋅ 5 = 2¹ ⋅ 3¹ ⋅ 5¹
64 = 2 ⋅ 2 ⋅ 2 ⋅ 2 ⋅ 2 ⋅ 2 = 2⁶
100 = 2 ⋅ 2 ⋅ 5 ⋅ 5 = 2² ⋅ 5²

You can see a proof of this theorem (which is actually not completely trivial) in an introductory number theory course.

We can use trial division to factor an integer. This is similar to our primality testing algorithm, which also used trial division. Given an integer N, we first try dividing N by 2, then by 3 and so on. If n is actually divisible by a number such as 3, then we must repeatedly attempt to divide by that same number. since the same prime factor may appear multiple times in the factorization.

Here is a first program for prime factorization:

n = int(input('Enter n: '))

i = 2

while n > 1:
    if n % i == 0:
        print('factor: ', i)
        n = n // i
    else:
        i += 1

Here's the program in action:

$ py factor.py
Enter n: 60
factor:  2
factor:  2
factor:  3
factor:  5

Study the program to understand how it works. The program tries dividing by all possible values. Once n is 1, we have divided out all factors, so the factorization is complete.

Note that the program will attempt to divide by non-prime factors, such as when i = 4. But n will never be divisible by such values. That's because any non-prime factor i is itself the product of two (or more) smaller primes, and we have already divided those primes out of n, so n cannot possibly be divisible by i.

The program works, but is inefficient because it potentially tests all values from 1 to n. Perhaps we can make it more efficient by testing only a subset of those values, just like we did for the primality testing program above. We'll discuss that more in the tutorial this week.

Intro to Algorithms Week 2 – Notes

Algorithms

Decimal representation

Extracting digits from an integer

Combining digits

numbers in different bases

extracting digits in any base

combining digits in any base

converting between bases

primality testing

prime factorization

Intro to Algorithms
Week 2 – Notes