Programming 1
Week 13: Notes

Some of this week's topics are covered in Introducing Python:

Here are some additional notes.

Making classes iterable

We've already seen that many built-in types in Python are iterable, including lists, strings, sets, dictionaries, and others. We can iterate over an iterable object in either of two ways:

1. Most commonly, we use a for statement:

for x in obj:
  print(x)

2. We can use the lower-level functions iter and next:

it = iter(obj)  # get an iterator
while True:
  try:
    print(next(it))   # get the next value
  except StopIteration:
    break

Suppose that we've written a class in Python to implement our own data type, and would like to make it iterable. For example, here is a linked list class:

class Node:
    def __init__(self, val, next):
        self.val = val
        self.next = next

class LinkedList:
    def __init__(self):
        self.head = None
    
    def prepend(self, x):
        self.head = Node(x, self.head)

We'd like to make the class iterable, so that a caller can do this, for instance:

# make a LinkedList
l = LinkedList()

# prepend some values
for i in range(5):    
  l.prepend(i)

# iterate over the LinkedList
for x in l:  
  print(x)

As one possible approach, we can implement Python's magic methods __iter__ and __next__, which are called by the iter() and next() functions. First we implement a class ListIterator:

class ListIterator:
    def __init__(self, curr):
        self.curr = curr
        
    def __next__(self):
        if self.curr == None:
            raise StopIteration()
        v = self.curr.val
        self.curr = self.curr.next
        return v

And now we add this method to the LinkedList class:

def __iter__(self):
    return ListIterator(self.head)

Now the caller will be able to iterate over a LinkedList using for. Internally, for will call __iter__, which will return a ListIterator object. A ListIterator keeps track of the current iteration position, and returns the next value each time the caller calls __next__.

Generator functions

An easier way to make a class iterable is to use a generator function, which is a function that returns a sequence of values using the built-in statement yield. Instead of writing the ListIterator class above, we can implement __iter__ in the LinkedList class as follows:

def __iter__(self):
    n = self.head
    while n != None:
        yield n.val
        n = n.next

With this implementation, the caller will be able to iterate over a LinkedList using the for statement, just as before. Each time the for statement needs a new value, the generator function above will run until it yields a new value, and then its execution will be suspended. Its execution will resume (after the yield statement) the next time a value is requested.

Generator functions have many other uses. For example, consider Project Euler's Problem 1:

Find the sum of all the multiples of 3 or 5 below 1000.

Here is a procedural solution:

def euler1():
    sum = 0
    for i in range(1000):
        if i % 3 == 0 or i % 5 == 0:
            sum += i
    return sum

This solution mixes the generation of the sequence with the computation of the sum. Instead, we may wish to separate these. We could generate the sequence by appending to a list:

def seq1():
    l = []
    for i in range(1000):
        if i % 3 == 0 or i % 5 == 0:
            l.append(i)
    return l

Or we can use a list comprehension, which is equivalent:

def seq2():
    return [i for i in range(1000) if i % 3 == 0 or i % 5 == 0]

In either case, we can invoke the built-in function sum() on the returned sequence to compute the desired sum.

Both seq1() and seq2() will use more memory than the original function euler1(), since they store the entire list in memory all at once. This will not be an issue for N = 1000, but if N is some larger number such as 100,000,000, the memory usage may be prohibitive. Notice that euler1() uses only a constant amount of memory.

So can we generate the list separately from summing it, and still use only a small amount of memory? Yes, by using a generator function:

def seq3():
    for i in range(1000):
        if i % 3 == 0 or i % 5 == 0:
            yield i

Now sum(seq3()) will once again compute the desired sum. At no point will the entire sequence be in memory. Instead, each time that sum requests a new value, the code in seq3() will run until it next yields a value, at which point its execution will be suspended until the next value is requested.

Here is yet another way to generate the sequence of numbers:

def seq4():
    return (i for i in range(1000) if i % 3 == 0 or i % 5 == 0)

This looks just like seq2(), but we use parentheses instead of brackets, so we have a generator comprehension. Like seq3(), seq4() will return a generator object that can produce values one by one, without needing all values to be in memory.

In the examples above, seq2() is essentially a more compact syntax for seq1(), and seq4() is essentially a more compact syntax for seq3().

Infinite sequences

A generator function may even produce a infinite sequence of values. For example:

def fibs():
    a = 1
    b = 1
    while True:
        yield a
        a, b = b, a + b

This function produces the infinite sequence of Fibonacci numbers. You probably don't want to try to print them all:

for n in fibs():  # an infinite loop
  print(n)

However you could reasonably print just some of them:

for n in fibs():
  if (n > 1_000_000)
    break
  print(n)

Infinite sequences can be useful in functional programming, and you will explore them more when you learn Haskell in a later class.

Testing code

Especially as we start to write larger programs, it is useful to write automated tests that check that our programs behave as expected. Large programs commonly have hundreds or even thousands of test cases.

A unit test tests some small piece of a program's functionality, generally a single function or class. Progams also often include integration tests that check that the entire program (or subsystems of it) work correctly. You may also hear about regression tests, which are specifically designed to test that a bug that has been fixed does not reappear again.

Often we write tests with the help of a test framework that automatically discovers tests in our code, runs them and reports on their success. For Python, I recommend the pytest framework, which is easy to get started with. You can get started with it by reading the helpful documentation on its web site.