Intro to Algorithms, 2019-20
Lecture 2 – Notes

numbers in different bases

By longstanding convention, basically every society on Earth writes numbers in base 10, otherwise known as decimal. When we write a number such as

2736

we mean

2 · 103 + 7 · 102 + 3 · 10 + 6

In base 10, every digit has a value from 0 to 9.

Of course, the number 10 is arbitrary. We would now like to work with numbers written in non-decimal bases, i.e. bases other than 10. For example, consider base 5. In the base 5 system, every digit has a value from 0 to 4. Consider the number

20315

Here, the subscript 5 means that the number is written in base 5. In this course, you may assume that any number written without a subscript is in base 10.

The decimal value of the base-5 number above is

2 · 53 + 0 · 52 + 3 · 5 + 1 = 26610

Base 2 (binary) is especially common in computer programming. Here are the first natural numbers in base 2:

We also often use base 16 (hexadecimal), in which we have extra digits a = 10, b = 11, c = 12, d = 13, e = 14, f = 15. For example,

ff16 = 25510 because 15 · 161 + 15 · 160 = 240 + 15 = 255

Internally, computers usually do not store numbers in base 10. At the hardware level, they are actually stored in binary (though this is invisible even to low-level programmers). Python functions such as print() internally perform arithmetic operations to produce decimal digits from a number. Similarly, functions such as int() use arithmetic to join decimal digits into a number.

generating digits in different bases

In the previous lecture, we saw algorithms that can split an integer into decimal digits, and join a series of decimal digits to form an integer.

Let's now generalize those algorithms to work with non-decimal bases. Actually the change is trivial: we simply change the constant 10 in our code!

For example, previously we saw this program to print a number's decimal digits:

n = int(input('Enter n: '))

while n > 0:
  d = n % 10
  n = n // 10
  print('digit:', d)

Let's modify it to print a number's binary digits. We only need to change the constant 10 to 2:

n = int(input('Enter n: '))

while n > 0:
  d = n % 2
  n = n // 2
  print('digit:', d)

Let's run the program:

Enter n: 43
digit:  1
digit:  1
digit:  0
digit:  1
digit:  0
digit:  1

The digits are generated in reverse order. The program's output means that

1010112 = 4310

This is because

1 · 25 + 0 · 24 + 1 · 23 + 0 · 22 + 1 · 2 + 1 = 32 + 8 + 2 + 1 = 43

We can modify the program to print the result as a string:

n = int(input('Enter n: '))

s = ''
while n > 0:
  d = n % 2
  n = n // 2
  s = str(d) + s    # prepend digit to string
print('in base 2:', s)

combining digits in different bases

Similarly, last week we saw a program that reads a series of decimal digits and joins them into an integer. Let's rewrite that program using for to loop over sys.stdin:

import sys

n = 0
for line in sys.stdin:
  digit = int(line)
  n = 10 * n + digit
print('n is', n)

We can easily modify this program to work another base. Once again, let's change 10 to 2:

import sys

n = 0
for line in sys.stdin:
  digit = int(line)
  n = 2 * n + digit
print('n is', n)

Now we run the program:

$ py hello.py
1
0
1
0
1
1
n is 43

The program has read the binary digits of 1010112 and joined them into the number 43.

converting between bases

To convert from base B to base C, all we need to do is read a number's digits in base B, then generate digits in base C. We can easily do that by the combining the programs in the two previous sections, as you may wish to do as an exercise.

primality testing

As we know from mathematics, a prime number is an integer greater than 1 whose only factors are 1 and itself. For example, 2, 7, 47 and 101 are all prime.

We would now like to write a program that tests whether a given number is prime. To do this, we will use a simple algorithm called trial division, which means dividing by each possible factor in turn. (By the way, there also exist more efficient (and complex) algorithms for primality testing; you may encounter these in more advanced courses.)

Here is a naive implementation of trial division:

n = int(input('Enter n: '))

prime = True
for i in range(2, n):  # loop from 2 .. (n - 1)
  if n % i == 0:
    prime = False
    break

if prime:
  print('n is prime')
else
  print('not prime')

This works fine, but is inefficient because it may need to test all integers from 2 up to (n – 1). When n is large, this can take a long time.

Actually for a given n we need test only the values up to sqrt(n). To see this, consider the following fact. If ab = n for integers a and b, then either a ≤ sqrt(n) or b ≤ sqrt(n). Proof: Suppose that a > sqrt(n) and b > sqrt(n). Then ab > sqrt(n) ⋅ sqrt(n) = n, a contradiction. So either a ≤ sqrt(n) or b ≤ sqrt(n).

It follows that if we have tested all the values from 2 through sqrt(n) and none of them divide n, then if ab = n we must have a = 1 or b = 1. And so n is prime.

So we can make our program much more efficient by replacing the statement

for i in range(2, n)

with

for i in range(2, int(math.sqrt(n)) + 1)

Note that int rounds down to the nearest integer. Without this function call, the code will fail, since math.sqrt returns a float but range expects the upper bound to be an integer.

prime factorization

The Fundamental Theorem of Arithmetic states that

Every positive integer has a unique prime factorization.

For example:

You can see a proof of this theorem in an introductory number theory course.

We can use trial division to factor an integer. This is similar to our primality testing algorithm, which also used trial division. Given an integer N, we first try dividing N by 2, then by 3 and so on. If n is actually divisible by a number such as 3, then we must repeatedly attempt to divide by that same number. since the same prime factor may appear multiple times in the factorization.

Here is a first program for prime factorization:

n = int(input('Enter n: '))

i = 2
  
while i < n:
  if n % i == 0:
    print(i, end = ' ')  # print ' ' between numbers, stay on same line
    n //= i
  else:
    i += 1
    
print(n)

Study the program to understand how it works. The program divides out all values that are less than n. Once they are all gone, the remaining value of n is the last prime factor, so we print it at the end.

Note that the program will attempt to divide by non-prime factors, such as when i = 4. But n will never be divisible by such values. That's because any non-prime factor i is itself the product of two (or more) smaller primes, and we have already divided those primes out of n, so n cannot possibly be divisible by i.

The program works, but is inefficient because it potentially tests all values from 1 to n. Just as in our primarily testing algorithm, we can make the program much more efficient by stopping our loop once i reaches sqrt(n).

We can see that this is valid using the same argument as in the last lecture. Once again, if ab = n for integers a and b, then we must have either a ≤ sqrt(n) or b ≤ sqrt(n). Proof: Suppose that a > sqrt(n) and b > sqrt(n). Then ab > sqrt(n) ⋅ sqrt(n) = n, a contradiction. So either a ≤ sqrt(n) or b ≤ sqrt(n).

It follows that if we have tested all the values from 2 through sqrt(n) and none of them divide n, then if ab = n we must have a = 1 or b = 1. And so n must be prime. Therefore we can end the loop and simply print n itself, which must be the last prime factor.

Here is the updated program:

import math
n = int(input('Enter n: ')) i = 2 while i <= math.sqrt(n): if n % i == 0: print(i, end = ' ') n //= i else: i += 1 print(n)