Week 4: Notes

running time of list operations

In the last lecture we saw various operations on lists. Now that we know about big O from our algorithms class, we can discuss the running time of list operations.

Most fundamentally, getting or setting a list element by index runs in O(1), and is extremely fast. In Python, getting the length of a list via len() also runs in O(1).

Suppose that the length of a list l is N. We may insert an element x at index i by calling l.insert(i, x), and may delete the element at index i by running del l[i]. Both of these operations take O(N), since they may need to shift over many elements in memory. However, the operation del l[-1], which deletes the last list element, will run in O(1).

Now let's consider appending an element by calling l.append(x), an extremely common operation. append() runs in O(1) on average. This means that although a single call to append() may occasionally take a relatively long time to run, a sequence of N append operations to an initially empty list will always run in O(N). For example, consider this program:

n = int(input('Enter n: '))

a = []
for i in range(n):
    a.append(i)

This program will run in O(N), and so the average running time of each call to append() is O(1).

Occasionally we will write O(1)^* to mean O(1) on average.

Let's consider the time to concatenate two lists using the + operator. If A is the length of list l, and B is the length of list m, then (l + m) will run in time O(A + B). Alternatively, we may write that it runs in O(|l| + |m|). The important point is that this operator always builds a new list, which takes time proportional to the new list's length.

This may have some surprising consequences. For example, consider these two programs, which differ only in their last line:

(1)

n = int(input('Enter n: '))
a = []
for i in range(n):
    a.append(i)

(2)

n = int(input('Enter n: '))
a = []
for i in range(n):
    a = a + [i]

How long will they take to run?

The first program makes N calls to append(), which runs in O(1) on average. The total time is N ⋅ O(1) = O(N).

The second program performs N list concatenations. The first builds a list of length 1; the second builds a length of 2, and so on, up to length N. Concatenating two lists to form a list of length M takes O(M). So the total time will be

O(1 + 2 + 3 + ... + N) = O(N²)

If you like, try running the two programs and entering N = 1,000,000. The first will complete instantly. The second will run for a long time. If N is 1,000,000, then a program that runs in O(N²) will be a million times slower than a program that runs in O(N) if the underlying constants are the same in both big-O functions.

The moral of the story: use append(), not concatentation, to add a single element to a list!

Finally, recall that x in l tests whether the list l contains the value x. This will run in O(N), since Python may need to scan all list elements to determine whether x is present.

tuples

A tuple in Python is an immutable sequence. Its length is fixed, and you cannot update values in a tuple. Tuples are written with parentheses:

>>> t = (3, 4, 5)
>>> t[0]
3
>>> t[2]
5

Essentially a tuple is an immutable list. All sequence operations will work with tuples: for example, you can access elements using slice syntax, and you can iterate over a tuple:

>>> t = (3, 4, 5, 6)
>>> t[1:3]
(4, 5)
>>> for x in t:
...    print(x)
... 
3
4
5
6
>>>

The built-in function tuple() will convert any sequence to a tuple:

>>> tuple([2, 4, 6, 8])
(2, 4, 6, 8)

>>> tuple(range(10))
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

We will often use tuples in our code. They are (slightly) more efficient than lists, and their immutability makes a program easy to understand, since when reading the code you don't need to worry about if/when their elements might change.

A tuple of two elements is called a pair. A tuple of three elements is called a triple. Usually in our code we will use tuples when we want to group a few elements together. For example, a pair such as (3, 4) is a natural way to represent a point in a 2-dimensional space. Similarly, a triple such as (5, 6, -10) may represent a point in 3-space.

multiple assignment

In Python we may assign to multiple variables at once to unpack a list or tuple:

>>> [a, b, c] = [1, 3, 5]
>>> a
1
>>> b
3
>>> c
5

>>> (x, y) = (4, 5)
>>> x
4
>>> y
5

In these assignments you may even write a list of variables on the left, without list or tuple notation:

>>> a, b, c = [1, 3, 5]
>>> x, y = (4, 5)

On the right side, you may write a sequence of values without brackets or parentheses, and Python will automatically form a tuple:

>>> t = 3, 4, 5
>>> t
(3, 4, 5)

In fact Python will freely convert between sequence types in assignments of this type. For example, all of the following are valid:

>>> a, b, c = 3, 4, 5
>>> [d, e] = (10, 11)
>>> f, g, h = range(3)   # sets f = 0, g = 1, h = 2

These "casual" conversions are often convenient. (They would not be allowed in stricter languages such as Haskell.)

Finally, note that variables on the left side of such as assignment may also appear on the right. The assignment is simultaneous: the new values of all variables on the left are computed using the old values of variables on the right. For example, we may swap the values of two variables like this:

>>> x, y = 10, 20
>>> x
10
>>> y
20
>>> x, y = y, x
>>> x
20
>>> y
10

In fact we already used this form of multiple simultaneous assignment when we implemented Euclid's algorithm in our algorithms class a couple of weeks ago. Here is that code again:

# compute gcd(a, b)
while b != 0:
    a, b = b, a % b

lists of tuples

A list of tuples is a convenient representation for a set of points. Let's write a program that will read a set of points from standard input, and determine which two points are furthest from each other. The input will contain one point per line, like this:

2 5
6 -1
-3 6
...

Here is the program:

import math
import sys

points = []
for line in sys.stdin:
    xs, ys = line.split()    # multiple assignment
    point = (int(xs), int(ys))
    points.append(point)

print(points)

m = 0   # maximum so far

for px, py in points:
    for qx, qy in points:
        dist = math.sqrt((px - qx) ** 2 + (py - qy) ** 2)
        if dist > m:
            m = dist

print(m)

The line xs, ys = line.split() is a multiple assignment as we saw in the previous section. Note that this will work only if each input line contains exactly two words. If there are more or fewer, it will fail:

>>> xs, ys = '20 30'.split()   # OK

>>> xs, ys = '20 30 40'.split()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)

Above, notice that a for loop may assign to multiple variables on each iteration of the loop when iterating over a sequence of sequences (as in this example). We could instead write

for p in points:
    for q in points:
        px, py = p
        qx, qy = q

but the form "for px, py in points" is more convenient.

The program above is a bit inefficient, since it will compute the distance between each pair of points twice. We may modify it so that the distance between each pair is computed only once, like this:

n = len(points)
for i in range(n):
    for j in range(i + 1, n):
        px, py = points[i]
        qx, qy = points[j]
        dist = math.sqrt((px - qx) ** 2 + (py - qy) ** 2)
        if dist > m:
            m = dist

The first time the inner loop runs, it will iterate over (N - 1) points. The second time it runs, it will iterate over (N - 2) points, and so on. The total running time will be O((N - 1) + (N - 2) + ... + 1) = O(N²). So it's still quadratic, though about twice as efficient as the previous version.

Could we make the program more efficient? In fact an algorithm is known that will find the two furthest points in O(N log N). You might learn about this algorithm in a more advanced course about computational geometry.

None

None is a special value in Python that represents nothingness. It is often useful to represent the absence of a value.

For example, here's a program that computes the maximum of all numbers read from standard input. It keeps the maximum in a variable 'mx', which is initialized to None before any numbers are read:

import sys

mx = None

for line in sys.stdin:
    x = int(line)
    if mx == None or x > mx:
        mx = x

print(f'max = {mx}')

Be aware that the Python interactive intepreter prints nothing at all when you give it an expression whose value is None:

>>> x = None
>>> x
>>>

(None is similar to the special value null in some other languages such as Java and C#.)

function definitions

In Python, and in almost every other programming language, we may define our own functions. As a simple example, here's a function that takes an integer 'n' and a string 's', and prints the string n times:

def printout(n, s):
    for i in range(n):
        print(s)

We use the keyword def to define a function in Python. In this example, n and s are parameters to the function. When we call the function, we will pass values for n and s. Values passed to a function are sometimes called an arguments. (Programmers sometimes use the terms "parameter" and "argument" more or less interchangeably.)

Let's put this function in a file called printout.py. We can run Python on this file and specify the '-i' parameter, which means that Python should read the file and then remain in an interactive session. In that session, we can call the function as we like:

$ python -i printout.py
>>> printout(5, 'hi')
hi
hi
hi
hi
hi
>>>

A function may return a value. For example:

def add(x, y, z):
    return x + y + z + 10

>>> add(2, 4, 6)
22

Note that the return statement will exit a function immediately, even from within the body of a loop or nested loop. This is often convenient. For example, let's write a function to determine whether an integer is prime:

def is_prime(n):
    if n < 2:
        return False
    
    for i in range(2, n):
        if n % i == 0:          # i is a factor of n
            return False        # n is composite
    
    return True # n is prime

When we wrote this code before in our algorithms class, we used a boolean variable:

prime = True
for i in range(2, n):
    if n % i == 0:
        prime = False
        break

With a function, the boolean is not needed; arguably this makes the code simpler.

If a function does not return a value explicitly, then it will return None. Also, if a return statement does not contain a value, it will return None. For example, consider this function:

def prime_check(n):
    for i in range(2, n):
        if n % i == 0:
            print('composite')
            return
        
    print('prime')

Let's call it:

>>> prime_check(14)
composite
>>> prime_check(23)
prime

In both cases the function returned None, and so the interactive interpreter didn't print any return value for the function.

local and global variables

Consider this Python program:

x = 7

def abc(a):
    i = a + x
    return i

def ha():
    i = 4
    print(abc(2))
    print(x + i)
ha()

print(x)

The variable x declared at the top is a global variable. Its value is visible everywhere: both inside the functions abc() and ha(), and also in the statement print(x) at the end of the program.

The variables i declared inside abc() and ha() are local variables. They are different variables: when the line "i = a + x" executes inside abc(), that does not change the value of i in ha().

Local variables are a fundamental feature of every modern programming language. Because a local variable's scope (the area of the program where it is visible) is small, it is easy to understand how the variable will behave. In larger programs, I recommend making variables local whenever possible.

Now consider this program, with two global variables at the top:

a = 10
b = 20

def abc(n):
    a = n
    return a + b

Let's try it:

>>> abc(50)
70
>>> a
10
>>>

We can see that the assignment "a = n" above did not set the value of the global a. Instead, it set a local variable called a. However, the statement "return a + b" did read the value of the global variable b.

What if we want the function abc() to set the global a? To achieve that, we can add a global declaration to the function:

def abc(n):
    global a
    a = n
    return a + b

Now calling abc() will set a:

>>> abc(100)
120
>>> a
100

We see that a function can read the value of a global variable without declaring it as global. But if a function wants to assign to a global variable, it must declare the variable as global.

To be more precise, here is how Python determines whether each variable in a function is local or global:

If there is a global decaration, the variable is global.
Otherwise, if the function ever assigns to the variable, it is considered local. If not, it is considered global (and must be defined somewhere outside the function).
Note that it is impossible to write a function that uses both a local variable "x" and also a global variable "x". Each variable name such as "x" is either always local or always global within a single function body.

When you write a larger program, it's best to have few global variables whose values may change, or even none. They can make a program hard to understand, since any code in the entire program could possibly modify them.

On the other hand, globals that you only read from are fine - essentially they just represent constant data. In Python, by convention we usually capitalize the names of such globals to emphasize that they are constant. For example:

SECONDS_PER_DAY = 24 * 3600

passing by value and reference

Consider this function, which adds all values in a list (or other iterable):

def sum_of(a):
    s = 0
    for x in a:
        s += x
    return s

Let's call it with a list of 1,000,000 elements:

>>> l = 1_000_000 * [5]
>>> sum_of(l)
5000000

When we called the function, did it copy the list when it passed it to the function? Actually no. The function only received a pointer to the list. To see this more easily, let's write a function that modifies a list element:

def inc(a):
    a[0] += 1

And let's call it:

>>> l = [2, 4, 6]
>>> inc(l)
>>> l
[3, 4, 6]

We see that Python passes lists by reference. When the program calls inc(l), then as the function runs l and a are the same list. If we modify the list in the function, the change is visible in the caller.

The opposite of passing by reference is passing by value, in which the function receives a copy of the data that is passed. In some languages (e.g. C++), an array may be passed by value.

For a large data structure such as an array with 1,000,000 elements, passing by reference will be much more efficient than passing by value. And in fact Python passes references when you pass any value to a function (except for a few small values such as None, True and False, though you can't easily tell the difference since these are immutable).

Now, strictly speaking, Python actually passes references to objects by value. This means that when you pass an object to a function, the function parameter receives a reference to the object, however it is a separate reference than the one in the caller. For example:

def foo(x):
    x = 'hello'

>>> y = 'yo'
>>> foo(y)
>>> y
'yo'

def bar(a):
    a[0] = 100
    a = [2, 4, 6]

>>> b = [0, 0, 0]
>>> bar(b)
>>> b
[100, 0, 0]

When we call the function foo(), x refers to the same string as y (the string is not copied). However, assigning to x does not change y, because x is a separate reference.

Similarly, when we call the function bar(), a refers to the same array as b (the array is not copied). If we modify that array, the change is visible in b. However, assigning to a does not change b, because a is a separate reference.

Many other languages such as Java, C#, and JavaScript use the same calling mechanism.

variable numbers of arguments

We may sometimes wish to write a function that can take a variable number of arguments. For example, we may wish to write a function that returns the average of all its arguments, no matter how many there are:

>>> avg(1.0, 3.0, 8.0)
4.0
>>> avg(2.0, 4.0, 6.0, 8.0, 10.0)
6.0

We may write this function using a parameter preceded by the character '*', which means that the parameter should gather all of its arguments into a tuple:

def avg(*args):
    return sum(args) / len(args)

When we make the call 'avg(1.0, 3.0, 8.0)', inside the function the variable 'args' has the value (1.0, 3.0, 8.0), which is a tuple of 3 values. (Recall that the sum() and len() functions work on tuples just as they do on lists and other sequences).

A parameter preceded by '*' can have any name, but the name 'args' is conventional in Python.

When we call a function we may sometimes have a list (or other sequence) that contains the arguments we'd like to pass. In this case, we can specify '*' before an argument to specify that Python should explode its values into separate arguments. For example, consider this function:

def add(x, y, z, q):
    return x + y + z + q + 10

When we call it, we may specify its arguments individually, or by exploding them from a list:

>>> add(10, 20, 30, 40)
110
>>> l = [10, 20, 30, 40]
>>> add(*l)
110

More commonly, this situation occurs when a function accepts a variable number of arguments, and we want to pass arguments from a list. For example, calling the avg() function we defined above:

>>> l = [3.0, 4.0]
>>> avg(*l)
3.5

default parameter values

When we define a function in Python, we can specify default parameter values that are used if the caller doesn't specify values for these parameters. For example:

# Split a string into two pieces.
def chop(s, frac = 0.5):
    n = int(len(s) * frac)
    return s[:n], s[n:]

>>> chop('watermelon', 0.3)
('wat', 'ermelon')
>>> chop('watermelon')
('water', 'melon')

In the second call above, we didn't specify a value for frac, so the function used the default value of 0.5.

In a function declaration, parameters with default values must appear at the end of the parameter list.

Many Python functions in the standard library have parameters with default values. For example, the pop(i) method removes a value at a given index i in a list, and returns the value. If the parameter 'i' is not specified, it defaults to -1, meaning that it will remove the value at the end of the list:

>>> l = [22, 44, 66, 88, 110]
>>> l.pop(3)
88
>>> l
[22, 44, 66, 110]
>>> l.pop()
110

You should be aware of one subtle danger in declaring default parameter values. When Python sees a function declaration with default values, it evaluates each of those values to an object which is reused on all invocations of the function. That leads to this surprising behavior:

def add(x, y, l = []):    # l defaults to an empty list
    l.append(x)
    l.append(y)
    return l

>>> add(3, 5, [7, 8])
[7, 8, 3, 5]
>>> add(3, 5)
[3, 5]
>>> add(10, 11)
[3, 5, 10, 11]         # unexpected: 3 and 5 are present in the list!

You can avoid that behavior by using an immutable value such as None as the default:

def add(x, y, l = None):
    if l == None:
        l = []
    l.append(x)
    l.append(y)
    return l

Now the function behaves as you might expect:

>>> add(3, 5)
[3, 5]
>>> add(10, 11)
[10, 11]

keyword arguments

When you call any function in Python, you may optionally specify parameter names when you provide arguments. An argument with a name is called a keyword argument.

For example, consider this function:

def digit_sum(x, y, z):
    return 100 * x + 10 * y + z

We may call it in any of the following ways:

>>> digit_sum(3, 4, 5)
345
>>> digit_sum(x = 3, y = 4, z = 5)
345
>>> digit_sum(z = 5, y = 4, x = 3)
345
>>> digit_sum(3, z = 5, y = 4)
345

Notice that keyword arguments may appear in any order in a function call.

If a function's parameters have default values, when we call the function we may want to provide arguments for only some parameters. We can use names to indicate which argument(s) we are providing. For example:

def digit_sum2(x = 8, y = 8, z = 8):
    return 100 * x + 10 * y + z

>>> digit_sum2(y = 2)
828

You may sometimes want to specify parameter names in a function call even when they are not necessary, since they can make a function call more readable (especially if it has many arguments).

A function may have keyword-only arguments, which are listed after a * in a function declaration. For example:

def foo(x, *, y = 1, z = 0):
    return x * y + z

In this declaration y and z are keyword-only arguments. The caller must specify parameter names when passing these:

>>> foo(10)
10
>>> foo(10, 2, 3)
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    foo(10, 2, 3)
    ~~~^^^^^^^^^^
TypeError: foo() takes 1 positional argument but 3 were given
>>> foo(10, y = 2, z = 3)
23

Every keyword-only argument must always have a default value.

Many functions and methods in Python's standard library have keyword-only arguments. For example, the sort() method sorts a list. It looks like this in our Python quick reference:

        l.sort(*, reverse = False, key = None)

We see that reverse and key are keyword-only arguments. Let's not worry about key for now (we'll see it in a later lecture). However, reverse will be useful for us even now: it tells Python to sort the list in reverse order. Because it's a keyword-only argument, we must specify it by name when calling this method:

>>> l = [5, 2, 8, 1, 3]

>>> l.sort(True)
Traceback (most recent call last):
  File "<python-input-6>", line 1, in <module>
    l.sort(True)
    ~~~~~~^^^^^^
TypeError: sort() takes no positional arguments

>>> l.sort(reverse = True)
>>> l
[8, 5, 3, 2, 1]