Week 8: Notes

modules

A module is a collection of definitions that Python code can import and use. We've been using modules in Python's standard library for weeks now. For example, the line

import math

lets us use functions in the math module, which is built into the standard library. This statement loads the module into memory and makes a variable called 'math' that points to it. Like everything else in Python, a module is actually an object:

>>> import math
>>> math
<module 'math' (built-in)>

After importing a module, we can access any name defined by the module by prefixing it with the module name:

>>> math.sin(0)
0.0

Here is a brief overview of some other ways to import (some of which we have seen before). We may wish to import some of the module's names directly into our namepace, so that we can access them without a prefix. We can do that using a 'from…import' statement:

>>> from math import sin, cos
>>> sin(0) + cos(0)
1.0

We can import all of a module's names using the '*' wildcard character:

>>> from math import *
>>> log(1) + sqrt(1)
1.0

The 'import...as' statement will import a module using an alternative name of our choice:

>>> import math as m
>>> m.ceil(4.5) + m.floor(4.5)
9

We may also specify an alternate name with importing a function from a module:

>>> from math import sin as sn
>>> sn(0)
0.0

In some larger libraries, modules may have submodules. For example, we've already seen the submodule matplotlib.pyplot, which we imported like this:

>>> import matplotlib.pyplot as plt

packages

A package is a special kind of module that can contain both top-level definitions and other modules. Packages are also a unit of software distribution. In other words, it's possible (and fairly easy) to write a package of Python code and then make it available for others to install and use on their systems.

Python includes a package manager called 'pip' that can install and remove packages on your system. As a first experiment, you can run 'pip list' to see a list of packages that are currently installed. (On macOS, you will need to run 'pip3' instead of 'pip'.) When I run this command, I see output that begins like this:

$ pip list
Package                         Version
------------------------------- ---------------
appdirs                         1.4.4
attrs                           22.1.0
banking.statements.nordea       1.3.0
banking.statements.osuuspankki  1.3.4.dev0
bcrypt                          3.2.0
…

I did not explicitly install most of these packages; instead, they were installed by various Python-based programs on my system.

We can easily install additional packages. pip finds packages to install in an enormous repository called PyPI (the Python Package Index), which currently lists over 580,000 packages contributed by thousands of users. For example, the sty package lets a program print colored output to the terminal. Let's install it:

$ pip install sty
Defaulting to user installation because normal site-packages is not writeable
Collecting sty
  Using cached sty-1.0.4-py3-none-any.whl (11 kB)
Installing collected packages: sty
Successfully installed sty-1.0.4
$

Now we can produce colored output. For example:

from sty import fg

def out():
    print(f'This text is {fg.green}green{fg.rs} and this text is {fg.red}red{fg.rs}.')

(Note that the blue coloring in the code above is the same syntax coloring that you see in all Python code in these notes. It has nothing to do with colored output!)

Let's run the function above:

>>> out()
This text is green and this text is red.

Note that if you want sty's colored output to work on Windows, you'll need to add an extra hack to your code as described on sty's web site:

import os

os.system('')

(It's unfortunate that this is necessary, and I hope that a future release of sty will work automatically on Windows.)

magic methods for operator overloading

Python's magic methods let us specify custom behavior for our classes. We have already seen two magic methods in Python: __init__ for initializing an object, and __repr__ for converting an object to a string.

Additional magic methods let us customize the behavior of the +, -, *, /, and // operators:

__add__
__sub__
__mul__
__truediv__
__floordiv__

More similar operators exist; you can read about them in the official Python reference. But the ones listed here are enough for us for now.

By the way, providing custom behavior for operators such as these is called operator overloading, and is possible in many, but not all, programming languages. (For example, Java has no operator overloading.)

To see how these magic methods may be used, let's revisit the Vec class that we wrote in the last lecture, representing a vector of arbitrary dimension:

class Vec:
    def __init__(self, *args):
        self.a = args

    # Generate a string representation such as [3 5 10].
    def __repr__(self):
        w = []
        for x in self.a:
            w.append(str(x))
        return '[' + ' '.join(w) + ']'

    def add(self, w):
        assert len(self.a) == len(w.a), 'vectors must have same dimension'
        sum = []

        for i in range(len(self.a)):
            sum.append(self.a[i] + w.a[i])

        return Vec(*sum)

Previously, we could add vectors using the add() method:

$ python -i vector.py 
>>> v = Vec(2.0, 4.0, 5.0)
>>> w = Vec(1.0, 2.0, 3.0)
>>> z = v.add(w)
>>> z
[3.0 6.0 8.0]

To allow vectors to be added using the + operator, we only need to change the name of the "add" method above to "__add__":

def __add__(self, w):
    assert len(self.a) == len(w.a), 'vectors must have same dimension'
    sum = []
    ...

Now we can add Vector objects using +:

$ python -i vector.py 
>>> v = Vec(2.0, 4.0, 5.0)
>>> w = Vec(1.0, 2.0, 3.0)
>>> z = v + w
>>> z
[3.0 6.0 8.0]

Behind the scenes, the + operator just calls the magic method, and in fact we can call it directly if we like:

>>> v.__add__(w)
[3.0 6.0 8.0]

Let's now implement the multiplication operator *. Actually there are two kinds of multiplication on vectors. We can multiply two vectors to compute their dot product, producing a scalar. Or we can multiply a vector by a scalar to produce another vector. We will implement both of these in a single method:

def __mul__(self, x):
    if isinstance(x, Vec):
        # Compute the dot product.
        s = 0
        for i in range(len(self.a)):
            s += self.a[i] * x.a[i]
        return s
    else:   # scalar multiplication
        b = []
        for i in range(len(self.a)):
            b.append(x * self.a[i])
        return Vec(*b)

Above, we've used the built-in function isinstance() that tests whether an object belongs to a given type.

Our multiplication operator works for both vector and scalar multiplication:

>>> v = Vec(2, 4, 5)
>>> v * v
45
>>> v * 10
[20 40 50]

However, with our existing code we cannot perform the scalar multiplication in the other direction:

>>> 10 * v
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for *: 'int' and 'Vec'

That's because Python will only invoke the __mul__ magic method (and similar methods) on the object that is on the left side of the multiplication operator.

However, there is an additional magic method __rmul__ that will be invoked on the object on the right side of the multiplication operator. (Similar methods __radd__, __rsub__ and so on also exist.) So let's implement __rmul__ now. Our implementation is simple:

def __rmul__(self, x):
    return self * x

Now we can multiply a scalar by a vector in that order:

>>> v = Vec(2, 4, 5)
>>> 10 * v
[20 40 50]

How does this work? When we write 10 * v, Python calls __rmul__(v, 10) since v is on the right. Then the line 'return self * x' computes v * 10, which in turn automatically invokes the magic method __mul__(v, 10).

iterables and sequences

Let's briefly review the concept of iterables and sequences in Python. An object is iterable if you can loop over it with the 'for' statement. An object is a sequence if you can access its elements by integer index, i.e. using the syntax s[i]. All sequences are iterable, but not all iterables are sequences.

We've now seen these kinds of sequences: lists, tuples, string, and ranges.

We've also seen these kinds of iterables which are not sequences: sys.stdin, file objects returned by open(), sets, and dictionaries.

Note that if you iterate over a dictionary you get its keys:

d = { 'red' : 1, 'green' : 2, 'blue' : 3 }
for x in d:
    print(x)

produces the output

red
green
blue

As we saw last week, a dictionary has methods keys(), values(), and items() that produce the keys, values, and key-value pairs in the dictionary. These methods all return iterable objects.

list comprehensions

A list comprehension is an expression that loops over a sequence and collects a series of computed values into a list. List comprehensions are powerful and convenient.

For example, consider this loop that builds a list of all perfect squares from 1 to 400:

squares = []
for i in range(1, 21):
    squares.append(i * i)

We may replace the three lines above by a single line with a list comprehension:

squares = [i * i for i in range(1, 21)]

In general, a list comprehension may have the form

[ <expression> for <var> in <sequence> ]

A comprehension of this form will loop over the given <sequence>. On each loop iteration, it sets <var> to the value of an element of the sequence, then evaluates the given <expression>. All results are collected into a list.

Here are some more examples of list comprehensions. This comprehension builds a list of numbers from 1 to 20 and their squares:

>>> [(i, i * i) for i in range(1, 11)]
[(1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]

We may add 1 to each element in a list:

>>> l = [2, 5, 7, 10]
>>> [i + 1 for i in l]
[3, 6, 8, 11]

Let's write a program that will read a single input line containing a number of integers, separated by spaces. The program will print the sum of the integers. Using a list comprehension, we can write this in a single line:

print(sum([int(w) for w in input().split()]))

Consider Project Euler's Problem 6 (find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum). Here's a solution using a list comprehension:

sum_of_squares = sum([x * x for x in range(1, 101)])
square_of_sum = sum(range(1, 101)) ** 2
answer = square_of_sum - sum_of_squares

As another example, here's a file animals_en_cz containing English and Czech animal names:

bear medvěd
bird pták
cat kočka
cow kráva
dog pes
goat kozel
horse kůň
mouse myš
pig prase
sheep ovce

Using a list comprehension, we may read it into a dictionary in a single line of code:

with open('animals_en_cz') as f:
    d = dict([line.split() for line in f])

This works because the split() method splits each line above into a 2-element list such as ['bear', 'medvěd']. We can pass a list of these lists to the dict() constructor, which interprets each 2-element list as a key-value pair.

Finally, here's a function that builds a nested list representing a matrix of zeroes:

def empty(rows, cols):
    return [cols * [0] for r in range(rows)]

if clauses in list comprehensions

A list comprehension may have an if clause containing an arbitrary condition. Only list elements that satisfy the condition are included in the generated list.

For example, this comprehension collects all characters in the string 'watermelon' that are in the second half of the alphabet:

>>> [c for c in 'watermelon' if c >= 'n']
['w', 't', 'r', 'o', 'n']

Here's a 1-line solution to Project Euler's Problem 1 (find the sum of all the multiples of 3 or 5 below 1000):

>>> sum([i for i in range(1000) if i % 3 == 0 or i % 5 == 0])
233168

multiple for clauses in list comprehensions

A list comprehension may have more than one 'for' clause. The 'for' clauses represent a nested loop. All values generated by the inner loop are collected into the resulting list.

For example, this comprehension generates all pairs of values (x, y), where 0 ≤ x, y < 3:

>>> [(x, y) for x in range(3) for y in range(3)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

The following comprehension is similar, but in it 'y' only iterates up to the value 'x', so it only generates pairs where y < x:

>>> [(x, y) for x in range(3) for y in range(x)]
[(1, 0), (2, 0), (2, 1)]

The comprehension above is equivalent to the following:

l = []
for x in range(3):
    for y in range(x):
        l.append( (x, y) )

Notice that when there are multiple 'for' clauses in a single comprehension:

The first 'for' clause represents the outer loop.
The comprehension generates a 1-dimensional list, not a list of lists.

Let's write a function to flatten a 2-dimensional matrix, i.e. return a 1-dimensional list of its elements:

def flatten(m):
    return [ x for row in m for x in row ]

For example:

>>> mat = [ [2, 4, 6],
...         [1, 3, 7],
...         [8, 9, 1] ]
>>> flatten(mat)
[2, 4, 6, 1, 3, 7, 8, 9, 1]

Let's now write a program that will read any number of lines of input, each containing any number of integers separated by whitespace. The program will print the sum of all the numbers on all the lines:

import sys

nums = [int(w) for line in sys.stdin for w in line.split()]
print(sum(nums))

It's possible to mix 'for' and 'if' clauses in a comprehension. For example, here's code to generate all pairs (x, y) where 0 ≤ x, y < 3, but skipping those with x = 1:

>>> [(x, y) for x in range(3) if x != 1 for y in range(3)]
[(0, 0), (0, 1), (0, 2), (2, 0), (2, 1), (2, 2)]

As a larger example, here's a comprehension that finds all triples of integers (a, b, c) such that a² + b² = c², with 0 ≤ a < b < c ≤ 20:

>>> [(a, b, c) for c in range(21)
               for b in range(c)
               for a in range(b)
               if a * a + b * b == c * c]
[(3, 4, 5), (6, 8, 10), (5, 12, 13), (9, 12, 15), (8, 15, 17), (12, 16, 20)]

nested list comprehensions

It's possible to nest one list comprehension inside another. Unlike the examples we saw above, this will produce a 2-dimensional result, i.e. a list of lists.

For example, suppose that we'd like to generate an N x N matrix, where each element is the sum of its row and column numbers (where rows and columns are numbered from 0). If N is 4, then the matrix will look like this:

Here is a function that takes n as a parameter and produces this matrix:

def num_matrix(n):
    return [ [ i + j for j in range(n) ] for i in range(n) ]

Notice that in this nested comprehension, the for loop on the right represents the outer loop. The inner comprehension will be evaluated once on each iteration of that loop.

Similarly, we may write a function that produces an identity matrix of dimensions N x N:

def identity_matrix(n):
    return [ [ int(i == j) for j in range(n) ] for i in range(n) ]

Let's try it:

>>> identity_matrix(4)
[[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]

This works because int(True) is 1, and int(False) is 0.

As another example, here is code that will read a matrix from standard input, where each input line contains the numbers in one row of the matrix:

import sys

m = [ [ int(w) for w in line.split() ] for line in sys.stdin ]

set comprehensions

A set comprehension is similar to a list comprehension, but collects values into a set, not a list.

For example, suppose that we have a set s of integers. We'd like to add one to each integer, and collect the resulting values into a new set t. We can perform this easily using a set comprehension:

>>> s = {7, 9, 100}
>>> t = {x + 1 for x in s}
>>> t
{8, 10, 101}

Suppose that we'd like to know how many distinct integers are products of two integers from 1 to 10. We can solve this in a single line using a set comprehension:

>>> len({ a * b for a in range(1, 11) for b in range(1, 11) })
42

Note that a list comprehension would not produce the same result, since the generated list would contain duplicate elements. With a set comprehension, the duplicates are automatically eliminated.

dictionary comprehensions

A dictionary comprehension is similar to a list or set comprehension, but collects values into a dictionary. For example:

>>> { x : x * x for x in range(15) }
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100, 11: 121, 12: 144, 13: 169, 14: 196}

We can easily swap the key-value pairs in a dictionary using a dictionary comprehension:

>>> d = { 'red' : 'červený', 'green' : 'zelený', 'blue' : 'modrý' }
>>> { v : k for k, v in d.items() }
{'červený': 'red', 'zelený': 'green', 'modrý': 'blue'}