Week 10: Notes

inheritance

Like other object-oriented languages, Python supports inheritance, which lets a class extend another class and change its behavior.

Suppose that we're writing software for a school. We might have a class Person, representing any person at the school. This class might have attributes such as name, address, year of birth, and so on. Some people are students, so we could write a subclass Student that inherits from the Person class. A Student has all the attributes of a Person, and might have additional attributes that represent the courses the student is taking, how many years they have been studying, their expected degree, and so on. In this situation, we say that Person is a superclass or base class or parent class.

Similarly, we could have another subclass Teacher that also inherits from Person, and has other attributes such as their salary, the number of years they have been teaching, and so on.

The world is full of category relationships such as these. If we're writing a program that manages events, we could have a parent class Event and subclasses such as Concert, Film, Play and so on. Or, for a program that manages businesses in a city, we could have a parent class Business and subclasses such as Restaurant, Bank, and Shop.

A subclass automatically inherits its parent's attributes and methods. It may add additional attributes and/or methods, and may also override any its parent's methods by providing an alternate implementation of them. When overriding a method, the subclass may choose to call the parent's version of the same method as part of its activity.

Let's look at an example. Consider a class that implements a stack using a linked list. We saw this in a recent algorithms lecture:

class Node:
    def __init__(self, val, next):
        self.val = val
        self.next = next

class LinkedStack:
    def __init__(self):
        self.head = None
    
    def push(self, x):
        self.head = Node(x, self.head)
  
    def pop(self):
        assert self.head != None, 'stack is empty'
        x = self.head.val
        self.head = self.head.next
        return x

    def is_empty(self):
        return self.head == None

We'd now like to write a class StatStack that is like LinkedStack, but has an additional property count containing the number of values that are currently on the stack, plus a method avg() that returns their average. We would like avg() to run in O(1). To achieve this, StatStack will remember both the number of values currently on the stack and also their sum.

We can write StatStack using inheritance:

class StatStack(LinkedStack):
    def __init__(self):
        super().__init__()   # call the __init__ method in the superclass
        self.total = 0
        self.count = 0

    def push(self, x):
        super().push(x)
        self.total += x
        self.count += 1

    def pop(self):
        x = super().pop()
        self.total -= x
        self.count -= 1
        return x

    def sum(self):
        return self.total

    def avg(self):
        return self.total / self.count

Above, the notation "class StackStack(LinkedStack)" means that the class StackStack inherits from LinkedStack.

StatStack has an initializer __init__() that first calls the base class initializer:

super().__init__()   # call the __init__ method in the superclass

The special function super() returns the object that this method was invoked on (just like 'self'), but considers it as an instance of the parent class, so that super().__init__() will call the __init__ method in the parent class of this object. After that call returns, __init__ (in the StatStack class) initializes the 'total' and 'count' attributes to 0.

StatStack overrides the push() and pop() methods from its parent class, meaning that StatStack provides its own implementation of these methods. In the push() method, StatStack calls super().push(x) to call the same-named method in the base class. It then runs self.total += x to update the running total. pop() is similar.

Let's try it:

>>> s = StatStack()
>>> s.push(5)
>>> s.push(10)
>>> s.push(45)
>>> s.avg()
20.0
>>> s.pop()
45
>>> s.avg()
7.5
>>> s.is_empty()
False

Our calls to push(), avg() and pop() invoke the implementations inside the StatStack class. StatStack has no implementation of is_empty(), so when we call is_empty() it invokes the implementation inside the parent class Stack.

A subclass may itself have subclasses. For example, we could have classes like these:

class Business:
    ...

class Restaurant(Business):    # A restaurant is a type of business
    ...

class Cafe(Restaurant):        # A cafe is a kind of restaurant
    ...

In general, when we call any method of an object o, Python will use the most derived implementation, i.e. the one defined in o's class itself or otherwise in the nearest superclass that has a definition of the method. For example, suppose that we have a object c of type Cafe and we call a method c.hours(). If hours() has a definition in the Cafe class, it will be called. Otherwise, Python will look for a definition in the Restaurant class, and then in the top-level Business class, and will call the first definition that it finds.

multiple inheritance

In some languages including Python and C++, a class may also have multiple superclasses. This complicates matters somewhat. Suppose that a class A derives from both B and C, and we create an instance 'a' of A and then call a.foo(). If A has no definition of foo() but both B and C do, then which superclass implementation will be invoked? Languages with multiple inheritance (including Python and some other languages such as C++) have rules for resolving situations such as this one, which may be somewhat complex. However, we will not discuss multiple inheritance further in this course.

inheritance versus composition

In designing an object-oriented program, sometimes we must decide whether a class A should be a subclass of a class B. In this situation, it's sometimes useful to ask whether the entities A and B have an is-a or a has-a relationship. If every instance of A is an instance of B, then inheritance makes sense. On the other hand, if every instance of A has an instance of B, then probably it is better to use composition, in which A has an attribute that points to a B.

For example, suppose that we are designing software for an auto repair shop. We might have a class Engine, with attributes such as capacity, horsepower, maker, and so on. We might also have a class Car, with its own set of attributes. Should Car inherit from Engine? In theory you could say that a car is like an engine, but has many additional features. However, this inheritance relationship would be questionable at best. It's more accurate to say that a car has an engine, so really the Car class should have an attribute that points to an Engine object.

As another example, suppose that we have a class ChessGame that represents a game of chess. It might have methods such as turn(), which returns the player (1 or 2) whose turn it is to move, move(), which makes a move, and winner(), which tells us which player (if any) has won the game. And suppose that we write an AI player for the game. Should we now have a class AIChessGame which is a subclass of ChessGame in which one player's moves are controlled by the AI? I also think this would be questionable. After all, I could write a second AI player, and then I'd like to be able to have the AI players play each other. But if each is in its own subclass, this won't work well. It's more natural to think of an AI player as an agent that uses a ChessGame object. So it would probably make more sense to have an AIPlayer class, which could take a ChessGame e.g. as an initializer argument.

Beginning programmers sometimes use inheritance in situations where composition would be more appropriate, so it's best to be a bit cautious. If you are unsure about whether to use inheritance or composition in a given situation, composition may be a better choice, especially since in general it leads to more flexibility in your program.

testing an object's type

We've already seen that Python includes a type() function that will return the type of any object:

>>> type(4)
<class 'int'>
>>> type('hello')
<class 'str'>

Suppose that I implement a hierarchy of classes, e.g.

class Event:
    def __init__(self, time):
        self.time = time

    ...

class Concert(Event):
    def __init__(self, time, artist):
        super().__init__(time)
        self.artist = artist

    ...

class SportsGame(Event):
    def __init__(self, time, team1, team2):
        super().__init__(time)
        self.team1 = team1
        self.team2 = team2

    ...

Now if I create an instance of any class, the type() function will map the instance to its class:

>>> from datetime import datetime
>>> c = Concert(datetime(2023, 12, 11, 17, 0), 'mig 21')
>>> type(c)
<class '__main__.Concert'>

Python also includes a built-in function isinstance() that tests whether an object has a certain type:

>>> isinstance(3, int)
True
>>> isinstance('hello', str)
True

isinstance() also works with user-defined classes:

>>> isinstance(c, Concert)
True
>>> isinstance(c, Event)
True

Notice that this last query returned True because a Concert is an Event! To put it differently, isinstance(o, C) returns true if C is the class of o, or is any ancestor of that class. This is often useful. For example, if we had many subclasses of Event, we could use isinstance() to tell whether an object represents any kind of event.

class objects and class attributes

Consider the Point class that we saw in an earlier lecture:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f'P({self.x}, {self.y})'

A class is actually an object in Python:

>>> Point
<class '__main__.Point'>

Because a class is an object, we can assign attributes to it. For example:

>>> Point.abc = 7
>>> Point.abc + 1
8

Class attributes are distinct from instance attributes. Each instance of the Point class has its own values of x and y, but there is only one value of the abc attribute, shared by all Point instances.

We might use a class attribute to store a constant instance of a class, for example:

>>> Point.origin = Point(0, 0)

As another example, suppose that we have a Student class, and each student has its own integer ID. We'd like to have some sort of variable next_id that remembers the next ID to be assigned. We could make this be a global variable, but it might be nicer to place it in the Student class. So we can use a class attribute:

class Student:
    next_id = 0

    def __init__(self, name):
        self.name = name
        self.id = Student.next_id
        Student.next_id += 1

Notice that we can initialize a class attribute inside a class definition. Python will initialize this attribute only once, as it reads the class definition - not every time it creates a new instance of a class.

class methods

In the Student class above, suppose that we'd like to add a method that resets next_id to zero (e.g. when a new semester begins and a new set of students arrive). We could attempt to write the method like this:

def reset(self):
    Student.next_id = 0

However, with this syntax we'll need to invoke the method on some particular instance of the Student class. That seems wrong, since the method has nothing to do with any particular student.

Instead, we can make this be a class method in the Person class:

    @classmethod
    def reset(cls):
        cls.next_id = 0

A class method is invoked on the class object itself, not on an instance of the class:

>>> Person.reset()

The first parameter of a class method is traditionally called cls, and refers to the class object itself. In this example, cls will be the Person class, so cls.next_id is just the same as Person.next_id.

As another example of a class method, consider the Vec class that we saw a few lectures ago representing a vector with any number of dimensions:

class Vec:
    def __init__(self, *coords):
        self.a = coords
    ...

Suppose that we'd like to write a function that builds a Vec from a string such as "[3.0 4.0 5.0]". We could use a global function, but it's nicer to place this in the Vec class itself. We can use a class method:

    @classmethod
    def from_str(cls, s):
        assert s[0] == '[' and s[-1] == ']'
        a = [float(w) for w in s[1:-1].split()]
        return cls(*a)

Let's call it:

>>> v = Vec.from_str('[3.0 4.0 5.0]')
>>> v.a
(3.0, 4.0, 5.0)

In the line return cls(*a) above, cls is the Vec class itself, so this is equivalent to return Vec(*a).

raising and catching exceptions

You have surely noticed that Python's built-in operators and library functions sometimes report errors. For example, the index() method returns the index of the first occurrence of a value in a sequence, but produces a ValueError if the value is not present:

>>> [3, 4, 5, 6].index(5)
2
>>> [3, 4, 5, 6].index(7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 7 is not in list

Similarly, the open() function produces a FileNotFoundError if a file does not exist:

>>> open('non_existent_file')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'non_existent_file'

These errors are actually exceptions, which are a mechanism supported by Python and many other languages. Any code that wants to report an error can raise (= throw) an exception. In the examples above, index() raised a ValueError exception, and open() raised a FileNotFoundError exception.

By default, an exception will terminate the program. However, Python's try … except statement can catch an exception and handle it in some other way. For example, suppose that we want to open a file and read its contents if the file exists, but still continue executing if it does not. We might write

try:
    f = open('poem')
    text = list(f)   # read all file lines into a list
except FileNotFoundError:
    print('warning: poem not found')
    text = []
print(len(text))

If open() runs without error, the code will read the file and the code in the 'except' block will not run. If open() raises a FileNotFoundError, then the code in the 'except' block will run, and then execution will continue normally after the 'try' statement since the exception has been handled. If open() raises some other kind of exception, then the except: block will not run, and the program will terminate (unless some enclosing code catches the exception that has been raised).

Note that an exception is actually an object, i.e. an instance of the built-in class Exception or one of its subclasses. ValueError and FileNotFoundError are classes that inherit from Exception. Each type of exception may have attributes that describe the error that occurred. For example, a FileNotFoundError has an attribute 'filename' containing the file that was not found. In a try … except statement, you can give a name to the exception that was caught and can examine its attributes:

name = input('Enter filename: ')

try:
    f = open(name)
    line = f.readline()
except FileNotFoundError as e:
    print(f'file not found: {e.filename}')

You may define your own classes of exceptions. For example, suppose that we're writing a stack class and we'd like to report an error if the caller attempts to pop a value from an empty stack. We may write

class EmptyStackException(Exception):
    pass

(As we've seen before, the 'pass' statement does nothing, and we can use it when writing a class with no methods.)

In the class definition above, EmptyStackException is a subclass of Exception, i.e. a special kind of Exception. We'll discuss subclassing more soon (see the notes below).

Now, in our stack class, we might write

def pop(self):
    if self.is_empty():
        raise EmptyStackException()
    …

The raise statement raises an exception. If the caller does not catch the exception, the program will be terminated.

In this example, an EmptyStackException has no attributes. If we like, we could give the EmptyStackException class an __init__() initalizer that stores attributes in an instance, and then they would be available to a caller who catches this exception in a try … except statement.

Note that an exception raised by a function f need not be caught by the immediate caller of f. Consider this example:

def a():
    f = open('poem')
    print('successful open')
    …


def b():
    a()

def c():
    try:
        b()
    except FileNotFoundError:
        print('file not found')

In this code, the call to open() in a() might raise a FileNotFoundError. There is no try … except statement in a(), or in its caller b(). However, c() contains a try … except statement that can catch a FileNotFoundError. If a FileNotFoundError is raised, Python will unwind the call stack, aborting the execution of a() and then b() until it arrives at the try … except statement in c(), which will catch the exception.

We see that a raise statement is a form of non-local exit that causes execution to jump to some outer point. In fact we've already seen two other statements in Python that can also jump out from the current execution point. Namely, 'break' immediately exist the current loop iteration, and 'return' immediately exits the current function call. 'raise' is more powerful in that it can immediately exit a series of nested function calls extending from a try … catch statement down to the function that raises the exception.

Here's one more point about exceptions. In a try … except statement, you can choose to specify no exception type at all, in which case the statement will catch any exception at all:

try:
    foo()
except:
    print('some error occurred')

However I don't generally recommend using this form of try … except. A try … except statement is easier to read when it indicates the type of exception that it anticipates. Furthermore, if some sort of error occurs other than the one that you expected to handle, then this form of try … except will catch it, which may lead to behavior that is surprising and difficult to debug.

As a final example, let's write a function that searches through a binary tree for any value that is greater than 90. If it finds one, it will throw an exception with the value that was found. The tree is not necessarily a binary search tree, so we must search everywhere in the tree using recursion. Here's the code:

class Found(Exception):
    def __init__(self, x):
        self.x = x

class Node:
    def __init__(self, val, left, right):
        self.val = val
        self.left, self.right = left, right

def find(n):
    if n == None:
        return
    if n.val > 90:
        raise Found(n.val)
    find(n.left)
    find(n.right)

def find_large(tree):
    try:
        find(tree)
        return None
    except Found as e:
        return e.x

Notice that the find_large() function catches a Found exception and returns the value in it. This shows that we can use an exception to escape a deeply nested set of recursive calls, and then catch the exception and return normally. However, I would not generally recommend this as a programming technique. It would probably be clearer and more efficient to return a value without using exceptions, as in this recursive function:

def find_large(n):
    if n == None:
        return None
    if n.val > 90:
        return n.val
    v = find_large(n.left)
    if v != None:
        return v
    return find_large(n.right)

Nevertheless this is an interesting example.

iterating with enumerate()

The enumerate() function lets you iterate over a sequence and its indices simultaneously.

For example, consider this function, which returns the index of the first odd element in a list or other iterable (or -1 if no odd elements are found):

def first_odd(a):
    for i in range(len(a)):
        if a[i] % 2 == 1:
            return i

    return -1

We may rewrite the function using enumerate():

def first_odd(a):
    for i, x in enumerate(a):
        if x % 2 == 1:
            return i

    return -1

On each iteration of the loop, i receives the index of an element in a, and x is the element at that index.

enumerate() actually returns an iterable of pairs. If we like, we may convert it to a list:

>>> list(enumerate([20, 40, 50, 80, 100]))
[(0, 20), (1, 40), (2, 50), (3, 80), (4, 100)]

iterating with zip()

The handy zip() function lets you iterate over two (or more) sequences simultaneously.

For example, let's use zip() to iterate over two lists of integers:

>>> l = [2, 4, 6, 8, 10]
>>> m = [20, 40, 60, 80, 100]
>>> for x, y in zip(l, m):
...     print(f'x = {x}, y = {y}')
x = 2, y = 20
x = 4, y = 40
x = 6, y = 60
x = 8, y = 80
x = 10, y = 100

Notice that on iteration we receive a pair of values, one from each of the lists that we zipped. (Think of a zipper on a jacket that pulls together two edges as it moves upward.)

zip() actually returns an iterable of pairs, which we may collect into a list:

>>> l = [2, 4, 6, 8, 10]
>>> m = [20, 40, 60, 80, 100]
>>> list(zip(l, m))
[(2, 20), (4, 40), (6, 60), (8, 80), (10, 100)]

Note that zip() wil stop as soon as it reaches the end of any list:

>>> l = [2, 4, 6]
>>> m = [20, 40, 60, 80, 100]
>>> list(zip(l, m))
[(2, 20), (4, 40), (6, 60)]

We can use zip() to simplify some loops. For example, consider a function that takes two lists a and b, and produces a new list in which each element is the sum of two corresponding elements from a and b:

# produce a list of sums of values in a and b
def list_sum(a, b):
    assert len(a) == len(b)
    return [a[i] + b[i] for i in range(len(a))]

Instead of iterating over indices, we may use zip():

# produce a list of sums of values in a and b
def list_sum(a, b):
    assert len(a) == len(b)
    return [x + y for x, y in zip(a, b)]

In fact zip() may take any number of arguments. If we give it three lists of integers, it will return a list of triples:

>>> list(zip([1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]))
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]

As we saw above, zip() converts two lists to a list of pairs:

>>> l = list(zip([1, 2, 3, 4, 5], [6, 7, 8, 9, 10]))
>>> l
[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]

Now, what will happen if we take these pairs and pass them as separate arguments to zip()? Let's try it:

>>> list(zip((1, 6), (2, 7), (3, 8), (4, 9), (5, 10)))
[(1, 2, 3, 4, 5), (6, 7, 8, 9, 10)]

We get back two tuples that look just like the lists that we started with! This works because zip() accepts any number of arguments. When we give it any number of pairs, it will zip together the first elements of all pairs, as well as the second elements.

And so we can use this clever trick to unzip any list of pairs. Let's write a function that will do that:

def unzip(pairs):
    return zip(*pairs)

For example:

>>> list(unzip([(1, 10), (2, 20), (3, 30), (4, 40)]))
[(1, 2, 3, 4), (10, 20, 30, 40)]

dictionaries with defaults

Suppose we have code that builds a dictionary holding the number of occurrences of each character in a string:

    count = {}
    for c in s:
        if c in count:
            count[c] += 1
        else:
            count[c] = 1

In this code it's a bit of a bother that we have to check whether each key c is already in the dictionary. As an easier alternative, we can use the defaultdict class that's built into Python's standard library. When we create a defaultdict, we provide it with a default value function. When we look up a key K in a defaultdict and it's not found, Python will call this function, passing no arguments. The function will return a default value, and Python will then automatically add a mapping from K to that value. For example:

>>> from collections import defaultdict
>>> count = defaultdict(lambda: 0)
>>> count['red']
0
>>> count['blue'] += 1
>>> count['blue']
1
>>> count
defaultdict(<function <lambda> at 0x7fb3b6370540>, {'red': 0, 'blue': 1})

Note that instead of "lambda: 0" we could just write "int", since the built-in int() function just returns 0, the default value of an integer:

>>> int()
0

Using a defaultdict, we can rewrite the character-counting code above more easily:

from collections import defaultdict

count = defaultdict(int)
for c in s:
    count[c] += 1