Programming 1, 2022-3
Week 10: Notes

class attributes

Consider the Point class that we saw in an earlier lecture:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f'P({self.x}, {self.y})'

As we have seen before, a class is actually an object in Python:

>>> Point
<class '__main__.Point'>

Because a class is an object, we can assign attributes to it. For example:

>>> Point.abc = 7
>>> Point.abc + 1
8

Class attributes are distinct from instance attributes. Each instance of the Point class has its own values of x and y, but there is only one value of the abc attribute, shared by all Point instances.

We might use a class attribute to store a constant instance of a class, for example:

>>> Point.origin = Point(0, 0)

As another example, suppose that we have a Student class, and each student has its own integer ID. We could use a class attribute to store the next ID to be assigned:

class Student:
    next_id = 0

    def __init__(self, name):
        self.name = name
        self.id = Student.next_id
        Student.next_id += 1

Notice that we can initialize a class attribute inside a class definition. Python will initialize this attribute only once, as it reads the class definition - not every time it creates a new instance of a class.

inheritance

Like other object-oriented languages, Python supports inheritance, a mechanism that allows a class to extend another class and to change its behavior.

Suppose that we're writing software for a school. We might have a class Person, representing any person at the school. This class might have attributes such as name, address, year of birth, and so on. Some people are students, so we could write a subclass Student that inherits from the Person class. A Student has all the attributes of a Person, and might have additional attributes that represent the courses the student is taking, how many years they have been studying, their expected degree, and so on. In this situation, we say that Person is a superclass or base class or parent class.

Similarly, we could have another subclass Teacher that also inherits from Person, and has other attributes such as their salary, the number of years they have been teaching, and so on.

The world is full of category relationships such as these. If we're writing a program that manages events, we could have a parent class Event and subclasses such as Concert, Film, Play and so on. Or, for a program that manages businesses in a city, we could have a parent class Business and subclasses such as Restaurant, Bank, and Shop.

A subclass automatically inherits its parent's attributes and methods. It may add additional attributes and/or methods, and may also override any its parent's methods by providing an alternate implementation of them. When overriding a method, the subclass may choose to call the parent's version of the same method as part of its activity.

To make these ideas more concrete, let's look at an example. Consider a class that implements a stack using a linked list. We saw this in a recent algorithms lecture:

class Node:
    def __init__(self, val, next):
        self.val = val
        self.next = next

class LinkedStack:
    def __init__(self):
        self.head = None
    
    def push(self, x):
        n = Node(x, self.head)
        self.head = n
  
    def pop(self):
        assert self.head != None, 'stack is empty'
        x = self.head.val
        self.head = self.head.next
        return x

    def is_empty(self):
        return self.head == None

We'd now like to write a class StatStack that is like LinkedStack, but has an additional property 'count' containing the number of values that are currently on the stack, plus a method avg() that returns their average. We would like avg() to run in O(1). To achieve this, StatStack will remember both the number of values currently on the stack and also their sum.

We can write StatStack using inheritance:

class StatStack(LinkedStack):
    def __init__(self):
        super().__init__()   # call the __init__ method in the superclass
        self.total = 0       # total of all values currently on the stack
        self.count = 0       # number of values on the stack        

    def push(self, x):
        super().push(x)
        self.total += x
        self.count += 1

    def pop(self):
        x = super().pop()
        self.total -= x
        self.count -= 1
        return x

    def sum(self):
        return self.total

    def avg(self):
        return self.total / self.count

Above, the notation class StackStack(LinkedStack) means that the class StackStack inherits from LinkedStack.

StatStack has an initializer __init__() that first calls the base class initializer:

        super().__init__()   # call the __init__ method in the superclass

The special function super() returns the object that this method was invoked on (just like 'self'), but considers it as an instance of the parent class, so that super().__init__() will call the __init__ method in the parent class of this object. After that call returns, __init__ (in the StatStack class) initializes the 'total' and 'count' attributes to 0.

StatStack overrides the push() and pop() methods from its parent class, meaning that StatStack provides its own implementation of these methods. In the push() method, StatStack calls super().push(x) to call the same-named method in the base class. It then runs self.total += x to update the running total. pop() is similar.

Let's try it:

>>> s = StatStack()
>>> s.push(5)
>>> s.push(10)
>>> s.push(45)
>>> s.avg()
20.0
>>> s.pop()
45
>>> s.avg()
7.5
>>> s.is_empty()
False

Our calls to push(), avg() and pop() invoke the implementations inside the StatStack class. StatStack has no implementation of is_empty(), so when we call is_empty() it invokes the implementation inside the parent class Stack. In general, when we call any method of an object o, Python will use the most derived implementation, i.e. the one defined in o's class itself or otherwise in the nearest superclass that has a definition of the method.

We may also ask about the type of the object s, using the isinstance() function that we saw before:

>>> isinstance(s, StatStack)
True
>>> isinstance(s, Stack)
True

Notice that s is a StatStack, and s is also a Stack. StatStack inherits from Stack, so every StatStack is a Stack.

multiple inheritance

In some languages including Python and C++, a class may also have multiple superclasses. This complicates matters somewhat. Suppose that a class A derives from both B and C, and we create an instance 'a' of A and then call a.foo(). If A has no definition of foo() but both B and C do, then which superclass implementation will be invoked? Languages with multiple inheritance (including Python) have rules for resolving situations such as this one, which may be somewhat complex. However, we will not discuss multiple inheritance further in this course.

inheritance versus composition

In designing an object-oriented program, sometimes we must decide whether a class A should be a subclass of a class B. In this situation, it's sometimes useful to ask whether the entities A and B have an is-a or a has-a relationship. If every instance of A is an instance of B, then inheritance makes sense. On the other hand, if every instance of A has an instance of B, then probably it is better to use composition, in which A has an attribute that points to a B.

For example, suppose that we are designing software for an auto repair shop. We might have a class Engine, with attributes such as capacity, horsepower, maker, and so on. We might also have a class Car, with its own set of attributes. Should Car inherit from Engine? In theory you could say that a car is like an engine, but has many additional features. However, this inheritance relationship would be questionable at best. It's more accurate to say that a car has an engine, so really the Car class should have an attribute that points to an Engine object.

Beginning programmers sometimes use inheritance in situations where composition would be more appropriate, so it's best to be a bit cautious. If you are unsure about whether to use inheritance or composition in a given situation, composition may be a better choice, especially since in general it leads to more flexibility in your program.

raising and catching exceptions

You have undoubtedly noticed that Python's built-in operators and library functions sometimes report errors. For example, the index() method returns the index of the first occurrence of a value in a sequence, but produces a ValueError if the value is not present:

>>> [3, 4, 5, 6].index(5)
2
>>> [3, 4, 5, 6].index(7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 7 is not in list

Similarly, the open() function produces a FileNotFoundError if a file does not exist:

>>> open('non_existent_file')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'non_existent_file'

These errors are actually exceptions, which are a mechanism supported by Python and many other languages. Any code that wants to report an error can raise (= throw) an exception. In the examples above, index() raised a ValueError exception, and open() raised a FileNotFoundError exception.

By default, an exception will terminate the program. However, Python's tryexcept statement can catch an exception and handle it in some other way. For example, suppose that we want to open a file and read its contents if the file exists, but still continue executing if it does not. We might write

try:
    f = open('poem')
    text = list(f)   # read all file lines into a list
except FileNotFoundError:
    print('warning: poem not found')
    text = []
print(len(text))

If open() runs without error, the code will read the file and the code in the 'except' block will not run. If open() raises a FileNotFoundError, then the code in the 'except' block will run, and then execution will continue normally after the 'try' statement since the exception has been handled. If open() raises some other kind of exception, then the except: block will not run, and the program will terminate (unless some enclosing code catches the exception that has been raised).

Note that an exception is actually an object, i.e. an instance of the built-in class Exception or one of its subclasses. ValueError and FileNotFoundError are classes that inherit from Exception. Each type of exception may have attributes that describe the error that occurred. For example, a FileNotFoundError has an attribute 'filename' containing the file that was not found. In a try … except statement, you can give a name to the exception that was caught and can examine its attributes:

name = input('Enter filename: ')

try:
    f = open(name)
    line = f.readline()
except FileNotFoundError as e:
    print(f'file not found: {e.filename}')

You may define your own classes of exceptions. For example, suppose that we're writing a stack class and we'd like to report an error if the caller attempts to pop a value from an empty stack. We may write

class EmptyStackException(Exception):
    pass

(As we've seen before, the 'pass' statement does nothing, and we can use it when writing a class with no methods.)

Now, in our stack class, we might write

    def pop(self):
        if self.is_empty():
            raise EmptyStackException()
        

The raise statement raises an exception. If the caller does not catch the exception, the program will be terminated.

In this example, an EmptyStackException has no attributes. If we like, we could give the EmptyStackException class an __init__() initalizer that stores attributes in an instance, and then they would be available to a caller who catches this exception in a try … except statement.

Note that an exception raised by a function f need not be caught by the immediate caller of f. Consider this example:

def a():
    f = open('poem')
    print('successful open')

def b(): a() def c(): try: b() except FileNotFoundError: print('file not found')

In this code, the call to open() in a() might raise a FileNotFoundError. There is no try … except statement in a(), or in its caller b(). However, c() contains a try … except statement that can catch a FileNotFoundError. If a FileNotFoundError is raised, Python will unwind the call stack, aborting the execution of a() and then b() until it arrives at the try … except statement in c(), which will catch the exception.

We see that a raise statement is a form of non-local exit that causes execution to jump to some outer point. In fact we've already seen two other statements in Python that can also jump out from the current execution point. Namely, 'break' immediately exist the current loop iteration, and 'return' immediately exits the current function call. 'raise' is more powerful in that it can immediately exit a series of nested function calls extending from a try … catch statement down to the function that raises the exception.

Here's one more point about exceptions. In a try … except statement, you can choose to specify no exception type at all, in which case the statement will catch any exception at all:

try:
    foo()
except:
    print('some error occurred')

However I don't generally recommend using this form of try … except. A try … except statement is easier to read when it indicates the type of exception that it anticipates. Furthermore, if some sort of error occurs other than the one that you expected to handle, then this form of try … except will catch it, which may lead to behavior that is surprising and difficult to debug.

try … finally

In some situations we may wish to ensure that a resource is closed or some other action will always be taken, even if an error occurs in our program. In these situations we may use the 'try' statement with a 'finally' clause. The code in the 'finally' clause will always run, even if the code in the 'try' block raises an exception.

For example, suppose that we have a function calculate() that performs some long calculation. Here is a function that calls calculate() 100 times and writes the results to a file:

def write_file():
    f = open('data', 'w')
    for i in range(100):
        f.write(f'i: {calculate(i)}\n')
    f.close()

If an exception is raised inside one of the calls to calculate(), then the file will not be closed and data previously written may be lost. Instad, let's use try .. finally:

def write_file():
    f = open('data', 'w')
    try:
        for i in range(100):
            f.write(f'i: {calculate(i)}\n')
    finally:
        f.close()

Now the file will be closed even if an error occurs.

The 'with' statement

The preceding situation, in which we want to close a file even if an error occurs, is so common that Python has a special statement for it. The 'with' statement assigns a file object (or other resource) to a variable, then runs a block of code. When the block of code exits for any reason, the object is automatically closed, just as if you had called close() on the object.

Let's rewrite the previous function using 'with':

def write_file():
    with open('data', 'w') as f:
        for i in range(100):
            f.write(f'i: {calculate(i)}\n')

It's good practice to use 'with' whenever you open a file, to ensure that the file will be closed even if the program exits with an error.

magic methods for equality

In this course we have already seen several of Python's magic methods: __init__, __repr__, plus operator overloading methods such as __add__ and __sub__.

Let's revisit the Vec class for representing vectors, which we saw in an earlier lecture:

class Vec:
    def __init__(self, *a):
        self.a = a
    
    def __add__(self, w):
        assert len(self.a) == len(w.a)
        b = []
        for i in range(len(self.a)):
            b.append(self.a[i] + w.a[i])
        return Vec(*b)

    # Generate a string representation such as [3 5 10].
    def __repr__(self):
        w = []
        for x in self.a:
            w.append(str(x))
        return '[' + ' '.join(w) + ']'

As a reminder, the class works like this:

>>> v = Vec(2, 4, 6)
>>> w = Vec(10, 20, 30)
>>> v + w
[12 24 36]

Now suppose that we create two Vec objects with the same coordinates. Are they equal?

>>> v = Vec(2, 4, 6)
>>> w = Vec(2, 4, 6)
>>> v == w
False

Python does not consider them to be equal. By default, two instances of a user-defined class are equal only if they are the same object, i.e. the 'is' operator returns True when applied to the objects.

Now, we may wish to change this. Two vectors are mathematically equal if they have the same coordinates, so in that case it would make sense for them to be equal according to Python's == operator. Python includes a magic method __eq__ that we may use to define equality on any class we like. Let's add an implementation of __eq__ to the Vec class:

# in class Vec
def __eq__(self, w):
    return self.a == w.a

With this method in place, v and w will be equal:

>>> v = Vec(2, 4, 6)
>>> w = Vec(2, 4, 6)
>>> v == w
True

Vec is an immutable class, so we might like to use it as a dictionary key. Let's attempt to create a dictionary that maps vectors to integers:

>>> v = Vec(2, 4, 6)
>>> x = Vec(10, 20, 30)
>>> d = {v: 100, x: 200}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Vec'

Python won't let us.

Python implements a dictionary as a hash table, a data structure that we recently studied in Introduction to Algorithms. The problem here is that we have redefined equality on the Vec class, but now Python doesn't know how to compute a hash function for Vec objects. Suppose that v == w. Then d[v] should be the same as d[w], since v and w are mathematically equal. In other for that to work, v and w must have the same hash vaue. More generally speaking, if two objects are equal using ==, then they must have the same hash value.

And so if we implement the __eq__ magic method on a class, then we must also implement another magic method called __hash__ if we wish to use instances of our class as hash table keys. __hash__ returns a hash code for an object; it is automatically invoked by Python's hash() function, which Python also uses in its dictionary implementation.

Let's add an implementation of __hash__ to the Vec class:

# in class Vec
def __hash__(self):
    return hash(self.a)

Now we can use Vec objects as dictionary keys:

>>> v = Vec(2, 4, 6)
>>> w = Vec(2, 4, 6)
>>> x = Vec(10, 20, 30)
>>> d = {v: 100, x: 200}
>>> d[v]
100
>>> d[w]
100
>>> d[x]
200