In the last lecture we saw various operations on lists. Now that we know about big O from our algorithms class, we can discuss the running time of list operations.
Most fundamentally, getting or setting a list element by index runs in O(1), and is extremely fast. In Python, getting the length of a list via len() also runs in O(1).
Suppose that the length of a list l
is N. We may insert an element x at index i by calling l.insert(i,
x), and may delete the element at index i by running del
l[i]. Both of these operations take O(N), since they
may need to shift over many elements in memory. However, the
operation del l[-1], which deletes the
last list element, will run in O(1).
Now let's consider appending an element by calling
l.append(x), an extremely common
operation. append() runs in O(1) on average. This means that
although a single call to append() may occasionally take a relatively
long time to run, a sequence of N
append operations to an initially empty list will always run
in O(N). For example, consider this program:
n = int(input('Enter n: '))
a = []
for i in range(n):
a.append(i)This program will run in O(N), and so the average running time of each call to append() is O(1).
Occasionally we will write O(1)* to mean O(1) on average.
Let's consider the time to concatenate two lists
using the + operator. If A is the length of list l,
and B is the length of list m, then
(l + m) will run in time O(A +
B). Alternatively, we may write that it runs in O(|l|
+ |m|). The important point is that this
operator always builds a new list, which takes time
proportional to the new list's length.
This may have some surprising consequences. For example, consider these two programs, which differ only in their last line:
(1)
n = int(input('Enter n: '))
a = []
for i in range(n):
a.append(i)(2)
n = int(input('Enter n: '))
a = []
for i in range(n):
a = a + [i]How long will they take to run?
The first program makes N calls to append(), which runs in O(1) on average. The total time is N ⋅ O(1) = O(N).
The second program performs N list concatenations. The first builds a list of length 1; the second builds a length of 2, and so on, up to length N. Concatenating two lists to form a list of length M takes O(M). So the total time will be
O(1 + 2 + 3 + ... + N) = O(N²)
If you like, try running the two programs and entering N = 1,000,000. The first will complete instantly. The second will run for a long time. If N is 1,000,000, then a program that runs in O(N2) will be a million times slower than a program that runs in O(N) if the underlying constants are the same in both big-O functions.
The moral of the story: use append(), not concatentation, to add a single element to a list!
Finally, recall that x in l
tests whether the list l contains the value x. This will run in O(N),
since Python may need to scan all list elements to determine whether
x is present.
A Python list is actually an array, meaning a sequence of elements stored in contiguous memory locations. In fact it is a dynamic array, since it can expand over time. (In many programming languages, arrays have a fixed size).
For this
reason, we can retrieve or update any element of a list by index
extremely quickly, in constant
time. Even if a
list l has 1,000,000,000 elements, accessing e.g. l[927_774_282]
will be extremely fast, just
as fast as accessing the first element of a short list.
In our algorithms class, we will usually use the term "array" to describe this kind of data structure.
The split() method is convenient for breaking strings into words. By default, it will consider words to be separated by sequences of whitespace characters, which include spaces, tabs, newlines, and other unprintable characters. It returns a list of strings:
>>> 'a big red truck'.split() ['a', 'big', 'red', 'truck'] >>> ' a big red truck '.split() ['a', 'big', 'red', 'truck']
Here's a program that reads a series of lines from standard input, each containing a series of integers separated by one or more spaces. It will print the sum of all the integers on all the input lines:
import sys
sum = 0
for line in sys.stdin:
for word in line.split():
sum += int(word)
print(sum)You may optionally pass a character to split() specifying a delimiter that separates words instead of whitespace:
>>> 'big_red_truck'.split('_')
['big', 'red', 'truck']The method join() has the opposite effect: it joins a list of strings into a single string, inserting a given separator string between each pair of strings:
>>> ' '.join(['tall', 'green', 'tree']) 'tall green tree' >>> '_'.join(['tall', 'green', 'tree']) 'tall_green_tree'
Here's a program that reads a single line, breaks it it into words, reverses the words, then prints them back out:
words = input().split() # break input into words
words = words[::-1] # reverse them
print(' '.join(words))Let's run it:
$ py rev.py one fine day day fine one $
Suppose that we write the following declarations:
>>> l = [3, 5, 7, 9] >>> m = l
Now the variables l
and m refer to the
same list. If we
change l[0], then the change will
be visible in m:
>>> l[0] = 33 >>> m[0] 33
This works because in fact in Python every variable is a pointer to an object. So two variables can point to the same objects, such as the list above. An assignment "m = l" does not copy a list. It runs in constant time, and is extremely fast.
Alternatively, we may make a copy of the list l. There are several possible ways to do that, all with the same effect:
>>> l = [3, 5, 7, 9] >>> n = l.copy() # technique 1: call the copy() method >>> n = list(l) # technique 2: call the list() function >>> n = l[:] # technique 3: use slice syntax
Now the list n has the same values as l, but it is a different list. Changes in one list will not be visible in the other:
>>> l[1] = 575 >>> l [3, 575, 7, 9] >>> n [3, 5, 7, 9]
Python provides two different operators for testing equality. The
first is the == operator:
>>> x == y True >>> x == z True
This operator tests for structural equality. In other words, given two lists, it compares them element by element to see if they are equal. (It will even descend into sublists to compare elements there as well.)
The second equality operator is the is
operator:
>>> x is y True >>> x is z False
This operator tests for reference equality. In other words, it returns true only if its arguments actually refer to the same object. (Reference equality is also called physical equality).
You may
want to use each of these operators in various situations. Note that
is returns
instantly (it runs in
constant time), whereas == may
traverse a list in its entirety, so it may be significantly slower.
A list may contain any type of elements, including sublists:
>>> m = [[1], [2, 3, 4, 5], [6]]
A list of lists is a natural way to represent a matrix in Python. Consider this matrix with dimensions 3 x 3:
5 11 12 2 8 7 14 2 6
If we want to store it in Python as a list of lists, normally we will use row-major order, in which each sublist holds a row of the matrix:
m = [ [5, 11, 12], [2, 8, 7], [14, 2, 6] ]
Alternatively we could use column-major order, in which each
sublist is a matrix column; then the first sublist would be
[5, 2, 14]. The choice is
arbitrary, but by convention we will generally use row-major order.
With this ordering, we can use the syntax m[i][j]
to access the matrix element at row i, column j. Do not
forget that rows and columns are numbered from 0:
>>> m = [ [5, 11, 12], [2, 8, 7], [14, 2, 6] ] >>> m[1][0] # row 1, column 0 2
Of course, we may use the index -1 to reference the last row or the last column:
>>> m[-1][-1] = 100 >>> m [[5, 11, 12], [2, 8, 7], [14, 2, 100]]
Here's a program that will read a matrix from the input, with one row of numbers per input line:
# Read a matrix from the input, e.g.
#
# 2 3 4
# 5 1 8
# 0 2 9
import sys
m = []
for line in sys.stdin:
# build a row of the matrix
row = []
for word in line.split():
row.append(int(word))
m.append(row)
print(m)Now suppose that we want to build a zero matrix of a given size, i.e. a matrix whose elements are all 0. Recall that we may use the * operator to build a list of a given length by repeating a given element:
>>> 3 * [0] [0, 0, 0]
So you might think that we can build e.g. a 3 x 3 matrix of zeros by
>>> m = 3 * [3 * [0]] >>> m [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
WARNING: This code looks correct but is not. Let's see what happens if we attempt to set the upper-left element of the matrix:
>>> m[0][0] = 7 >>> m [[7, 0, 0], [7, 0, 0], [7, 0, 0]]
It appears that several matrix elements have changed!
Here's what's going on here: all three of the sublists are actually pointers to the same list. Here is a similar example:
>>> a = [1, 2, 3] >>> m = [a, a] >>> m [[1, 2, 3], [1, 2, 3]] >>> a[0] = 7 >>> m [[7, 2, 3], [7, 2, 3]]
The line m = [a, a] creates a list with
two elements, each of which is a pointer to the list a.
When a changes, the change is visible in
m.
With that understanding, let's revisit our attempt to create a 3 x 3 matrix of zeros:
>>> m = 3 * [3 * [0]] >>> m [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
This code constructs a single list with three zeroes (3
* [0]), and then repeats it 3 times, however the repetition
does not copy the list - it merely makes three pointers to the
same list. And so an update to any matrix element will actually be
visible in three places in the matrix.
Here's a correct way to make a 3 x 3 matrix of zeroes:
m = []
for i in range(3):
m.append(3 * [0])This may seem like more work, though later in this course we'll see how we can write even this form in a single line.
In Python, and in almost every other programming language, we may define our own functions. As a simple example, here's a function that takes an integer 'n' and a string 's', and prints the string n times:
def printout(n, s):
for i in range(n):
print(s)
We use the keyword def to define a
function in Python. In this example, n and s are parameters to
the function. When we call the function, we will
pass values for n and s. Values passed to a function are
sometimes called an arguments. (Programmers sometimes use the
terms "parameter" and "argument" more or less
interchangeably.)
Let's put this function in a file called
printout.py. We can run Python on this
file and specify the '-i' parameter,
which means that Python should read the file and then remain in an
interactive session. In that session, we can call the function as we
like:
$ python -i printout.py >>> printout(5, 'hi') hi hi hi hi hi >>>
A function may return a value. For example:
def add(x, y, z):
return x + y + z + 10
>>> add(2, 4, 6)
22
Note that the return statement will exit
a function immediately, even from within the body of a loop or
nested loop. This is often convenient. For example, let's write a
function to determine whether an integer is prime:
def is_prime(n):
if n < 2:
return False
for i in range(2, n):
if n % i == 0: # i is a factor of n
return False # n is composite
return True # n is primeWhen we wrote this code before in our algorithms class, we used a boolean variable:
prime = True
for i in range(2, n):
if n % i == 0:
prime = False
breakWith a function, the boolean is not needed; arguably this makes the code simpler.
If a function does not return a value explicitly,
then it will return None. Also, if a return
statement does not contain a value, it will return None. For example,
consider this function:
def prime_check(n):
for i in range(2, n):
if n % i == 0:
print('composite')
return
print('prime')Let's call it:
>>> prime_check(14) composite >>> prime_check(23) prime
In both cases the function returned None, and so the interactive interpreter didn't print any return value for the function.