Programming 1, 2020-1
Week 10: Notes

Some of today's topics are covered in these chapters of Introducing Python:

Here are some more notes.

first-class functions

In Python, functions are first-class values. That means that we can work with functions just like with other values such as integers and strings: we can refer to functions with variables, pass them as arguments, return them from other functions, and so on.

Here is a Python function that adds the numbers from 1 to 1,000,000:

def bigSum():
    sum = 0
    for i in range(1, 1_000_001):
        sum += i
    return sum

We can put this function into a variable f:

>>> f = bigSum

And now we can call f just like the original function bigSum:

>>> f()
500000500000

Let's write a function time_it that takes a function as an argument:

def time_it(f):
    start = time.time()
    x = f()
    end = time.time()
    print(f'function ran in {end - start:.2f} seconds')
    return x

Given any function f, time_it runs f and measures the time that elapses while f is running. It prints this elapsed time, and then returns whatever f returned:

>>> time_it(big_sum)
function ran in 0.04 seconds
500000500000

This is a first example illustrating that it can be useful to pass functions to functions. As we will see, there are many other reasons why we might want to do this.

As another example, here is a function max_by that finds the maximum value in an input sequence, applying a function f to each element to yield a comparison key:

def max_by(seq, f):
    max_elem = None
    max_val = None
    for x in seq:
        v = f(x)
        if max_elem == None or v > max_val:
            max_elem = x
            max_val = v
    return max_elem

We can use max_by to find the longest list in a list of lists:

>>> max_by([[1, 7], [3, 4, 5], [2]], len)
[3, 4, 5]

Or we can use it to find the list whose last element is greatest:

def last(s):
    return s[-1]

>>> max_by([[1, 7], [3, 4, 5], [2]], last)
[1, 7]

This capability is so useful that it's built into the standard library. The standard function max can take a keyword argument key holding a function that works exactly like the second argument to max_by:

>>> max([[1, 7], [3, 4, 5], [2]], key = len)
[3, 4, 5]

The built-in function sorted and the sort() method take a similar key argument, so that you can sort by any attribute you like. For example:

>>> l = [[2, 7], [1, 3, 5, 2], [3, 10, 6], [8]]
>>> l.sort(key = len)
>>> l
[[8], [2, 7], [3, 10, 6], [1, 3, 5, 2]]

lambda expressions

Let's return to the previous example where we were given a list of lists, and found the list whose last element is greatest:

def last(s):
    return s[-1]

>>> max_by([[1, 7], [3, 4, 5], [2]], last)
[1, 7]

It's a bit of a nuisance to have to define a separate function last here. Instead, we can use a lambda expression:

>>> max_by([[1, 7], [3, 4, 5], [2]], lambda l: l[-1])
[1, 7]

A lambda expression creates a function "on the fly", without giving it a name. In other words, a lambda expression creates an anonymous function.

A function created by a lambda expression is no different from any other function: we can call it, pass it as an argument, and so forth. Even though the function is initially anonymous, we can certainly put it into a variable:

>>> abc = lambda x, y: 2 * x + y
>>> abc(10, 3)
23

The assignment to abc above is basically equivalent to

def abc(x, y):
    return 2 * x + y

which is how we would more typically define this function.

nested functions

Python allows us to write nested functions, i.e. functions that are defined inside other functions or methods.

As an example, suppose that we'd like to write a function replace_with_max() that takes a square matrix m and returns a matrix n in which each value in m is replaced with the maximum of its neighbors in all 4 directions. For example, if m is

2 4
5 9

then replace_with_max(m) will return

5 9
9 5

As a first attempt, we might write

def replace_with_max(m):
    size = len(m)
    
    # Make a matrix of dimensions (size x size) filled with zeroes
    n = [ size * [ 0 ] for _ in range(size) ]
    
    for r in range(size):
        for c in range(size):
            n[r][c] = max(m[r  1][c], m[r + 1][c],
                          m[r][c  1], m[r][c + 1])
                          
    return n

However, we have a problem: if a square (r, c) is at the edge of the matrix, then an array reference such as m[r][c + 1] might go out of bounds.

To solve this problem, let's write a nested helper function get(i, j) that returns an array element if the position (i, j) is inside the matrix, otherwise (- math.inf), i.e. -∞. Here is the improved function:

def replace_with_max(m):
    def get(i, j):
        if 0 <= i < size and 0 <= j < size:
            return m[i][j]
        else:
            return -math.inf
    
    size = len(m)
    
    # Make a matrix of dimensions (size x size) filled with zeroes
    n = [ size * [ 0 ] for _ in range(size) ]
    
    for r in range(size):
        for c in range(size):
            n[r][c] = max(get(r - 1, c), get(r + 1, c),
                          get(r, c - 1), get(r, c + 1))
                          
    return n

Notice that the nested function can refer to the parameter m. It can also refer to the local variable size that is defined in its containing function replace_with_max(). This is quite convenient. If we declared the helper function get() outside the function replace_with_max(), it would have to take m and size as extra parameters, and we would have to pass these values on each call to get(), which would be a bother.

functions as return values

A function can return a function. As an example, let's write a function add_n(n) that takes an integer n and returns a function that adds n to its argument. As one possible approach, we can define a nested function and then return it:

def add_n(n):
    def adder(x):
        return x + n
        
    return adder

Let's try it:

>>> f = add_n(10)
>>> f(5)
15

Alternatively, we can write add_n using a lambda:

def add_n(n):
    return lambda x: x + n

transforming a function

We can write a function that takes a function f as an argument and returns a transformed function based on f.

For example, let's write a function twice() that takes a function f and returns a function g such that g(x) = f(f(x)) for any x. We can define a nested function and return it:

def twice(f):
    def g(x):
        return f(f(x))

    return g

Let's try it:

>>> f = twice(lambda x: x + 10)
>>> f(5)
25

We can even pass the result of twice() back to the same function, yielding a function that applies the original function four times:

>>> f = twice(twice(lambda x: x + 10))
>>> f(5)
45

As before, we can alternatively define twice() using a lambda:

def twice(f):
    return lambda x: f(f(x))

__eq__ and __hash__

Consider a simple class that represents a point in two dimensions:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

Let's create two Point objects:

>>> p = Point(3, 4)
>>> q = Point(3, 4)

These points have the same x and y coordinates. So will Python consider them to be equal?

>>> p == q
False

It does not.

By default, Python's == operator compares instances of user-defined classes based on object identity. In other words, two objects are considered to be equal only if they are actually the same object.

For some classes, this notion of equality may be appropriate. In other cases, we may wish to redefine what equality means for our class. We can do that by implementing the magic method __eq__. Python will call this magic method automatically when the == operator compares two objects.

Let's add an implementation of __eq__ to our Point class that says that two points are equal if they have the same x and y coordinates:

    def __eq__(self, q):
        return self.x == q.x and self.y == q.y

Now p and q will be equal:

>>> p = Point(3, 4)
>>> q = Point(3, 4)
>>> p == q
True

Let's now try to make a set that contains the point p:

>>> p = Point(3, 4)
>>> s = set()
>>> s.add(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Point'

Python won't allow this. Recall that Python implements a set internally using a hash table. The problem is that we have redefined __eq__, but now Python's default hash function is inconsistent with our definition of equality. That's because the default hash function is based on object identity, so it may return distinct hash codes for any two distinct objects. Suppose that we build a set s containing p, and then evaluate 'q in s' to ask whether q is in the set. Because p and q are equal according to our definition, we would expect the answer to be True. However, Python's default hash function may return different values for p and q, and so p and q may be in different hash chains if Python uses this hash function. And so 'q in s' could return False. To avoid this problem, Python won't let us add p to a set.

However, we may resolve the situation by defining our own hash function for the class. We can do this by implementing the magic method __hash__, which Python will call automatially when it wants a hash code for an object. Let's write __hash__ in our Point class:

    def __hash__(self):
        return hash( (self.x, self.y) )

Our implementation simply constructs a tuple containing self.x and self.y, then calls Python's default hash function to hash the tuple. This is a typical way to write __hash__.

Now we can add p to a set:

>>> p = Point(3, 4)
>>> q = Point(3, 4)
>>> s = set()
>>> s.add(p)
>>> q in s
True