Week 4: Notes

running time of list operations

In the last lecture we saw various operations on lists. Now that we know about big O from our algorithms class, we can discuss the running time of list operations.

Most fundamentally, getting or setting a list element by index runs in O(1), and is extremely fast. In Python, getting the length of a list via len() also runs in O(1).

Suppose that the length of a list l is N. We may insert an element x at index i by calling l.insert(i, x), and may delete the element at index i by running del l[i]. Both of these operations take O(N), since they may need to shift over many elements in memory. However, the operation del l[-1], which deletes the last list element, will run in O(1).

Now let's consider appending an element by calling l.append(x), an extremely common operation. append() runs in O(1) on average. This means that although a single call to append() may occasionally take a relatively long time to run, a sequence of N append operations to an initially empty list will always run in O(N). For example, consider this program:

n = int(input('Enter n: '))

a = []
for i in range(n):
    a.append(i)

This program will run in O(N), and so the average running time of each call to append() is O(1).

Occasionally we will write O(1)* to mean O(1) on average.

Let's consider the time to concatenate two lists using the + operator. If A is the length of list l, and B is the length of list m, then (l + m) will run in time O(A + B). Alternatively, we may write that it runs in O(|l| + |m|). The important point is that this operator always builds a new list, which takes time proportional to the new list's length.

This may have some surprising consequences. For example, consider these two programs, which differ only in their last line:

(1)

n = int(input('Enter n: '))
a = []
for i in range(n):
    a.append(i)

(2)

n = int(input('Enter n: '))
a = []
for i in range(n):
    a = a + [i]

How long will they take to run?

The first program makes N calls to append(), which runs in O(1) on average. The total time is N ⋅ O(1) = O(N).

The second program performs N list concatenations. The first builds a list of length 1; the second builds a length of 2, and so on, up to length N. Concatenating two lists to form a list of length M takes O(M). So the total time will be

O(1 + 2 + 3 + ... + N) = O(N²)

If you like, try running the two programs and entering N = 1,000,000. The first will complete instantly. The second will run for a long time. If N is 1,000,000, then a program that runs in O(N2) will be a million times slower than a program that runs in O(N) if the underlying constants are the same in both big-O functions.

The moral of the story: use append(), not concatentation, to add a single element to a list!

Finally, recall that x in l tests whether the list l contains the value x. This will run in O(N), since Python may need to scan all list elements to determine whether x is present.

lists are arrays

A Python list is actually an array, meaning a sequence of elements stored in contiguous memory locations. In fact it is a dynamic array, since it can expand over time. (In many programming languages, arrays have a fixed size).

For this reason, we can retrieve or update any element of a list by index extremely quickly, in constant time. Even if a list l has 1,000,000,000 elements, accessing e.g. l[927_774_282] will be extremely fast, just as fast as accessing the first element of a short list.

In our algorithms class, we will usually use the term "array" to describe this kind of data structure.

splitting and joining strings

The split() method is convenient for breaking strings into words. By default, it will consider words to be separated by sequences of whitespace characters, which include spaces, tabs, newlines, and other unprintable characters. It returns a list of strings:

>>> 'a big red truck'.split()
['a', 'big', 'red', 'truck']

>>> '   a     big    red   truck   '.split()
['a', 'big', 'red', 'truck']

Here's a program that reads a series of lines from standard input, each containing a series of integers separated by one or more spaces. It will print the sum of all the integers on all the input lines:

import sys

sum = 0
for line in sys.stdin:
    for word in line.split():
        sum += int(word)

print(sum)

You may optionally pass a character to split() specifying a delimiter that separates words instead of whitespace:

>>> 'big_red_truck'.split('_')
['big', 'red', 'truck']

The method join() has the opposite effect: it joins a list of strings into a single string, inserting a given separator string between each pair of strings:

>>> ' '.join(['tall', 'green', 'tree'])
'tall green tree'

>>> '_'.join(['tall', 'green', 'tree'])
'tall_green_tree'

Here's a program that reads a single line, breaks it it into words, reverses the words, then prints them back out:

words = input().split()       # break input into words
words = words[::-1]           # reverse them
print(' '.join(words))

Let's run it:

$ py rev.py
one fine day
day fine one
$ 

structural and reference equality

Suppose that we write the following declarations:

>>> l = [3, 5, 7, 9]
>>> m = l

Now the variables l and m refer to the same list. If we change l[0], then the change will be visible in m:

>>> l[0] = 33
>>> m[0]
33

This works because in fact in Python every variable is a pointer to an object. So two variables can point to the same objects, such as the list above. An assignment "m = l" does not copy a list. It runs in constant time, and is extremely fast.

Alternatively, we may make a copy of the list l. There are several possible ways to do that, all with the same effect:

>>> l = [3, 5, 7, 9]
>>> n = l.copy()      # technique 1: call the copy() method
>>> n = list(l)       # technique 2: call the list() function
>>> n = l[:]          # technique 3: use slice syntax

Now the list n has the same values as l, but it is a different list. Changes in one list will not be visible in the other:

>>> l[1] = 575
>>> l
[3, 575, 7, 9]
>>> n
[3, 5, 7, 9]

Python provides two different operators for testing equality. The first is the == operator:

>>> x == y
True
>>> x == z
True

This operator tests for structural equality. In other words, given two lists, it compares them element by element to see if they are equal. (It will even descend into sublists to compare elements there as well.)

The second equality operator is the is operator:

>>> x is y
True
>>> x is z
False

This operator tests for reference equality. In other words, it returns true only if its arguments actually refer to the same object. (Reference equality is also called physical equality).

You may want to use each of these operators in various situations. Note that is returns instantly (it runs in constant time), whereas == may traverse a list in its entirety, so it may be significantly slower.

nested lists

A list may contain any type of elements, including sublists:

>>> m = [[1], [2, 3, 4, 5], [6]]

A list of lists is a natural way to represent a matrix in Python. Consider this matrix with dimensions 3 x 3:

5  11  12
2   8   7
14  2   6

If we want to store it in Python as a list of lists, normally we will use row-major order, in which each sublist holds a row of the matrix:

m = [ [5, 11, 12], [2, 8, 7], [14, 2, 6] ]

Alternatively we could use column-major order, in which each sublist is a matrix column; then the first sublist would be [5, 2, 14]. The choice is arbitrary, but by convention we will generally use row-major order.

With this ordering, we can use the syntax m[i][j] to access the matrix element at row i, column j. Do not forget that rows and columns are numbered from 0:

>>> m = [ [5, 11, 12], [2, 8, 7], [14, 2, 6] ]
>>> m[1][0]    # row 1, column 0
2

Of course, we may use the index -1 to reference the last row or the last column:

>>> m[-1][-1] = 100
>>> m
[[5, 11, 12], [2, 8, 7], [14, 2, 100]]

Here's a program that will read a matrix from the input, with one row of numbers per input line:

# Read a matrix from the input, e.g.
#
# 2 3 4
# 5 1 8
# 0 2 9

import sys

m = []
for line in sys.stdin:
    # build a row of the matrix

    row = []
    for word in line.split():
        row.append(int(word))

    m.append(row)

print(m)

Now suppose that we want to build a zero matrix of a given size, i.e. a matrix whose elements are all 0. Recall that we may use the * operator to build a list of a given length by repeating a given element:

>>> 3 * [0]
[0, 0, 0]

So you might think that we can build e.g. a 3 x 3 matrix of zeros by

>>> m = 3 * [3 * [0]]
>>> m
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

WARNING: This code looks correct but is not. Let's see what happens if we attempt to set the upper-left element of the matrix:

>>> m[0][0] = 7
>>> m
[[7, 0, 0], [7, 0, 0], [7, 0, 0]]

It appears that several matrix elements have changed!

Here's what's going on here: all three of the sublists are actually pointers to the same list. Here is a similar example:

>>> a = [1, 2, 3]
>>> m = [a, a]
>>> m
[[1, 2, 3], [1, 2, 3]]
>>> a[0] = 7
>>> m
[[7, 2, 3], [7, 2, 3]]

The line m = [a, a] creates a list with two elements, each of which is a pointer to the list a. When a changes, the change is visible in m.

With that understanding, let's revisit our attempt to create a 3 x 3 matrix of zeros:

>>> m = 3 * [3 * [0]]
>>> m
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

This code constructs a single list with three zeroes (3 * [0]), and then repeats it 3 times, however the repetition does not copy the list - it merely makes three pointers to the same list. And so an update to any matrix element will actually be visible in three places in the matrix.

Here's a correct way to make a 3 x 3 matrix of zeroes:

m = []
for i in range(3):
    m.append(3 * [0])

This may seem like more work, though later in this course we'll see how we can write even this form in a single line.

function definitions

In Python, and in almost every other programming language, we may define our own functions. As a simple example, here's a function that takes an integer 'n' and a string 's', and prints the string n times:

def printout(n, s):
    for i in range(n):
        print(s)

We use the keyword def to define a function in Python. In this example, n and s are parameters to the function. When we call the function, we will pass values for n and s. Values passed to a function are sometimes called an arguments. (Programmers sometimes use the terms "parameter" and "argument" more or less interchangeably.)

Let's put this function in a file called printout.py. We can run Python on this file and specify the '-i' parameter, which means that Python should read the file and then remain in an interactive session. In that session, we can call the function as we like:

$ python -i printout.py
>>> printout(5, 'hi')
hi
hi
hi
hi
hi
>>> 

A function may return a value. For example:

def add(x, y, z):
    return x + y + z + 10

>>> add(2, 4, 6)
22

Note that the return statement will exit a function immediately, even from within the body of a loop or nested loop. This is often convenient. For example, let's write a function to determine whether an integer is prime:

def is_prime(n):
    if n < 2:
        return False
    
    for i in range(2, n):
        if n % i == 0:          # i is a factor of n
            return False        # n is composite
    
    return True # n is prime

When we wrote this code before in our algorithms class, we used a boolean variable:

prime = True
for i in range(2, n):
    if n % i == 0:
        prime = False
        break

With a function, the boolean is not needed; arguably this makes the code simpler.

If a function does not return a value explicitly, then it will return None. Also, if a return statement does not contain a value, it will return None. For example, consider this function:

def prime_check(n):
    for i in range(2, n):
        if n % i == 0:
            print('composite')
            return
        
    print('prime')

Let's call it:

>>> prime_check(14)
composite
>>> prime_check(23)
prime

In both cases the function returned None, and so the interactive interpreter didn't print any return value for the function.