Some of this week's topics are covered in Introducing Python:
Chapter 9. Functions
Generators
Generator Functions
Chapter 19. Be a Pythonista
Test
Here are some additional notes.
We've already seen that many built-in types in Python are iterable, including lists, strings, sets, dictionaries, and others. We can iterate over an iterable object in either of two ways:
1. Most commonly, we use
a for
statement:
for x in obj: print(x)
2. We can use the lower-level functions
iter
and
next
:
it = iter(obj) # get an iterator while True: try: print(next(it)) # get the next value except StopIteration: break
Suppose that we've written a class in Python to implement our own data type, and would like to make it iterable. For example, here is a linked list class:
class Node: def __init__(self, val, next): self.val = val self.next = next class LinkedList: def __init__(self): self.head = None def prepend(self, x): self.head = Node(x, self.head)
We'd like to make the class iterable, so that a caller can do this, for instance:
# make a LinkedList l = LinkedList() # prepend some values for i in range(5): l.prepend(i) # iterate over the LinkedList for x in l: print(x)
As one possible approach, we can implement Python's magic methods
__iter__
and __next__
, which are called by
the iter() and next() functions. First we implement a class
ListIterator
:
class ListIterator: def __init__(self, curr): self.curr = curr def __next__(self): if self.curr == None: raise StopIteration() v = self.curr.val self.curr = self.curr.next return v
And now we add this method to the LinkedList class:
def __iter__(self): return ListIterator(self.head)
Now the caller will be able to iterate over a LinkedList using for
.
Internally, for
will call __iter__
, which
will return a ListIterator object. A
ListIterator keeps track of the current iteration position,
and returns the next value each time the caller calls __next__
.
An easier way to make a class iterable is to use a generator
function, which is a function
that returns a sequence of values using the built-in statement yield
.
Instead of writing the ListIterator class above, we can implement
__iter__
in the LinkedList
class as follows:
def __iter__(self): n = self.head while n != None: yield n.val n = n.next
With this implementation, the caller will be able to iterate over a
LinkedList using the for
statement, just as before. Each
time the for statement needs a new value, the generator function
above will run until it yields a new value, and then its
execution will be suspended. Its execution
will resume (after the yield statement) the next time a value is
requested.
Generator functions have many other uses. For example, consider Project Euler's Problem 1:
Find the sum of all the multiples of 3 or 5 below 1000.
Here is a procedural solution:
def euler1(): sum = 0 for i in range(1000): if i % 3 == 0 or i % 5 == 0: sum += i return sum
This solution mixes the generation of the sequence with the computation of the sum. Instead, we may wish to separate these. We could generate the sequence by appending to a list:
def seq1(): l = [] for i in range(1000): if i % 3 == 0 or i % 5 == 0: l.append(i) return l
Or we can use a list comprehension, which is equivalent:
def seq2(): return [i for i in range(1000) if i % 3 == 0 or i % 5 == 0]
In either case, we can invoke the built-in function sum() on the returned sequence to compute the desired sum.
Both seq1() and seq2() will use more memory than the original
function euler1()
, since they store
the entire list in memory all
at once. This will not be an issue for N = 1000, but if N is some
larger number such as 100,000,000, the memory usage may be
prohibitive. Notice that euler1()
uses only a constant amount of
memory.
So can we generate the list separately from summing it, and still use only a small amount of memory? Yes, by using a generator function:
def seq3(): for i in range(1000): if i % 3 == 0 or i % 5 == 0: yield i
Now sum(seq3())
will once again compute the desired sum.
At no point will the entire sequence be in memory. Instead, each time
that sum
requests a new value, the code
in seq3()
will run until it next yields a value,
at which point its execution will be suspended until the next value
is requested.
Here is yet another way to generate the sequence of numbers:
def seq4(): return (i for i in range(1000) if i % 3 == 0 or i % 5 == 0)
This looks just like seq2(), but we use parentheses instead of brackets, so we have a generator comprehension. Like seq3(), seq4() will return a generator object that can produce values one by one, without needing all values to be in memory.
In the examples above, seq2() is essentially a more compact syntax for seq1(), and seq4() is essentially a more compact syntax for seq3().
A generator function may even produce a infinite sequence of values. For example:
def fibs(): a = 1 b = 1 while True: yield a a, b = b, a + b
This function produces the infinite sequence of Fibonacci numbers. You probably don't want to try to print them all:
for n in fibs(): # an infinite loop print(n)
However you could reasonably print just some of them:
for n in fibs(): if (n > 1_000_000) break print(n)
Infinite sequences can be useful in functional programming, and you will explore them more when you learn Haskell in a later class.
Especially as we start to write larger programs, it is useful to write automated tests that check that our programs behave as expected. Large programs commonly have hundreds or even thousands of test cases.
A unit test tests some small piece of a program's functionality, generally a single function or class. Progams also often include integration tests that check that the entire program (or subsystems of it) work correctly. You may also hear about regression tests, which are specifically designed to test that a bug that has been fixed does not reappear again.
Often we write tests with the help of a test framework that automatically discovers tests in our code, runs them and reports on their success. For Python, I recommend the pytest framework, which is easy to get started with. You can get started with it by reading the helpful documentation on its web site.