Consider the Point class that we saw in an earlier lecture:
class Point: def __init__(self, x, y): self.x = x self.y = y def __repr__(self): return f'P({self.x}, {self.y})'
As we have seen before, a class is actually an object in Python:
>>> Point <class '__main__.Point'>
Because a class is an object, we can assign attributes to it. For example:
>>> Point.abc = 7 >>> Point.abc + 1 8
Class attributes are distinct from instance attributes. Each instance of the Point class has its own values of x and y, but there is only one value of the abc attribute, shared by all Point instances.
We might use a class attribute to store a constant instance of a class, for example:
>>> Point.origin = Point(0, 0)
As another example, suppose that we have a Student class, and each student has its own integer ID. We could use a class attribute to store the next ID to be assigned:
class Student: next_id = 0 def __init__(self, name): self.name = name self.id = Student.next_id Student.next_id += 1
Notice that we can initialize a class attribute inside a class definition. Python will initialize this attribute only once, as it reads the class definition - not every time it creates a new instance of a class.
Like other object-oriented languages, Python supports inheritance, a mechanism that allows a class to extend another class and to change its behavior.
Suppose that we're writing software for a school. We might have a class Person, representing any person at the school. This class might have attributes such as name, address, year of birth, and so on. Some people are students, so we could write a subclass Student that inherits from the Person class. A Student has all the attributes of a Person, and might have additional attributes that represent the courses the student is taking, how many years they have been studying, their expected degree, and so on. In this situation, we say that Person is a superclass or base class or parent class.
Similarly, we could have another subclass Teacher that also inherits from Person, and has other attributes such as their salary, the number of years they have been teaching, and so on.
The world is full of category relationships such as these. If we're writing a program that manages events, we could have a parent class Event and subclasses such as Concert, Film, Play and so on. Or, for a program that manages businesses in a city, we could have a parent class Business and subclasses such as Restaurant, Bank, and Shop.
A subclass automatically inherits its parent's attributes and methods. It may add additional attributes and/or methods, and may also override any its parent's methods by providing an alternate implementation of them. When overriding a method, the subclass may choose to call the parent's version of the same method as part of its activity.
To make these ideas more concrete, let's look at an example. Consider a class that implements a stack using a linked list. We saw this in a recent algorithms lecture:
class Node: def __init__(self, val, next): self.val = val self.next = next class LinkedStack: def __init__(self): self.head = None def push(self, x): n = Node(x, self.head) self.head = n def pop(self): assert self.head != None, 'stack is empty' x = self.head.val self.head = self.head.next return x def is_empty(self): return self.head == None
We'd now like to write a class StatStack that is like LinkedStack, but has an additional property 'count' containing the number of values that are currently on the stack, plus a method avg() that returns their average. We would like avg() to run in O(1). To achieve this, StatStack will remember both the number of values currently on the stack and also their sum.
We can write StatStack using inheritance:
class StatStack(LinkedStack): def __init__(self): super().__init__() # call the __init__ method in the superclass self.total = 0 # total of all values currently on the stack self.count = 0 # number of values on the stack def push(self, x): super().push(x) self.total += x self.count += 1 def pop(self): x = super().pop() self.total -= x self.count -= 1 return x def sum(self): return self.total def avg(self): return self.total / self.count
Above, the notation class StackStack(LinkedStack) means that the class StackStack inherits from LinkedStack.
StatStack has an initializer __init__() that first calls the base class initializer:
super().__init__() # call the __init__ method in the superclass
The special function super() returns the object that this method was invoked on (just like 'self'), but considers it as an instance of the parent class, so that super().__init__() will call the __init__ method in the parent class of this object. After that call returns, __init__ (in the StatStack class) initializes the 'total' and 'count' attributes to 0.
StatStack overrides the push() and pop() methods from its parent class, meaning that StatStack provides its own implementation of these methods. In the push() method, StatStack calls super().push(x) to call the same-named method in the base class. It then runs self.total += x to update the running total. pop() is similar.
Let's try it:
>>> s = StatStack() >>> s.push(5) >>> s.push(10) >>> s.push(45) >>> s.avg() 20.0 >>> s.pop() 45 >>> s.avg() 7.5 >>> s.is_empty() False
Our calls to push(), avg() and pop() invoke the implementations inside the StatStack class. StatStack has no implementation of is_empty(), so when we call is_empty() it invokes the implementation inside the parent class Stack. In general, when we call any method of an object o, Python will use the most derived implementation, i.e. the one defined in o's class itself or otherwise in the nearest superclass that has a definition of the method.
We may also ask about the type of the object s, using the isinstance() function that we saw before:
>>> isinstance(s, StatStack)
True
>>> isinstance(s, Stack)
True
Notice that s is a StatStack, and s is also a Stack. StatStack inherits from Stack, so every StatStack is a Stack.
In some languages including Python and C++, a class may also have multiple superclasses. This complicates matters somewhat. Suppose that a class A derives from both B and C, and we create an instance 'a' of A and then call a.foo(). If A has no definition of foo() but both B and C do, then which superclass implementation will be invoked? Languages with multiple inheritance (including Python) have rules for resolving situations such as this one, which may be somewhat complex. However, we will not discuss multiple inheritance further in this course.
In designing an object-oriented program, sometimes we must decide whether a class A should be a subclass of a class B. In this situation, it's sometimes useful to ask whether the entities A and B have an is-a or a has-a relationship. If every instance of A is an instance of B, then inheritance makes sense. On the other hand, if every instance of A has an instance of B, then probably it is better to use composition, in which A has an attribute that points to a B.
For example, suppose that we are designing software for an auto repair shop. We might have a class Engine, with attributes such as capacity, horsepower, maker, and so on. We might also have a class Car, with its own set of attributes. Should Car inherit from Engine? In theory you could say that a car is like an engine, but has many additional features. However, this inheritance relationship would be questionable at best. It's more accurate to say that a car has an engine, so really the Car class should have an attribute that points to an Engine object.
Beginning programmers sometimes use inheritance in situations where composition would be more appropriate, so it's best to be a bit cautious. If you are unsure about whether to use inheritance or composition in a given situation, composition may be a better choice, especially since in general it leads to more flexibility in your program.
You have undoubtedly noticed that Python's built-in operators and library functions sometimes report errors. For example, the index() method returns the index of the first occurrence of a value in a sequence, but produces a ValueError if the value is not present:
>>> [3, 4, 5, 6].index(5) 2 >>> [3, 4, 5, 6].index(7) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: 7 is not in list
Similarly, the open() function produces a FileNotFoundError if a file does not exist:
>>> open('non_existent_file') Traceback (most recent call last): File "<stdin>", line 1, in <module> FileNotFoundError: [Errno 2] No such file or directory: 'non_existent_file'
These errors are actually exceptions, which are a mechanism supported by Python and many other languages. Any code that wants to report an error can raise (= throw) an exception. In the examples above, index() raised a ValueError exception, and open() raised a FileNotFoundError exception.
By default, an exception will terminate the
program. However, Python's
try
… except
statement can catch an
exception and handle it in some other way. For example, suppose that
we want to open a file and read its contents if the
file exists,
but still continue executing if it does not. We might write
try: f = open('poem') text = list(f) # read all file lines into a list except FileNotFoundError: print('warning: poem not found') text = [] print(len(text))
If open() runs without error, the code will read the file and the code in the 'except' block will not run. If open() raises a FileNotFoundError, then the code in the 'except' block will run, and then execution will continue normally after the 'try' statement since the exception has been handled. If open() raises some other kind of exception, then the except: block will not run, and the program will terminate (unless some enclosing code catches the exception that has been raised).
Note that an exception is actually an object, i.e. an instance of the built-in class Exception or one of its subclasses. ValueError and FileNotFoundError are classes that inherit from Exception. Each type of exception may have attributes that describe the error that occurred. For example, a FileNotFoundError has an attribute 'filename' containing the file that was not found. In a try … except statement, you can give a name to the exception that was caught and can examine its attributes:
name = input('Enter filename: ') try: f = open(name) line = f.readline() except FileNotFoundError as e: print(f'file not found: {e.filename}')
You may define your own classes of exceptions. For example, suppose that we're writing a stack class and we'd like to report an error if the caller attempts to pop a value from an empty stack. We may write
class EmptyStackException(Exception): pass
(As we've seen before, the 'pass' statement does nothing, and we can use it when writing a class with no methods.)
Now, in our stack class, we might write
def pop(self): if self.is_empty(): raise EmptyStackException() …
The raise statement raises an exception. If the caller does not catch the exception, the program will be terminated.
In this example, an EmptyStackException has no attributes. If we like, we could give the EmptyStackException class an __init__() initalizer that stores attributes in an instance, and then they would be available to a caller who catches this exception in a try … except statement.
Note that an exception raised by a function f need not be caught by the immediate caller of f. Consider this example:
def a(): f = open('poem') print('successful open') …
def b(): a() def c(): try: b() except FileNotFoundError: print('file not found')
In this code, the call to open() in a() might raise a FileNotFoundError. There is no try … except statement in a(), or in its caller b(). However, c() contains a try … except statement that can catch a FileNotFoundError. If a FileNotFoundError is raised, Python will unwind the call stack, aborting the execution of a() and then b() until it arrives at the try … except statement in c(), which will catch the exception.
We see that a raise statement is a form of non-local exit that causes execution to jump to some outer point. In fact we've already seen two other statements in Python that can also jump out from the current execution point. Namely, 'break' immediately exist the current loop iteration, and 'return' immediately exits the current function call. 'raise' is more powerful in that it can immediately exit a series of nested function calls extending from a try … catch statement down to the function that raises the exception.
Here's one more point about exceptions. In a try … except statement, you can choose to specify no exception type at all, in which case the statement will catch any exception at all:
try: foo() except: print('some error occurred')
However I don't generally recommend using this form of try … except. A try … except statement is easier to read when it indicates the type of exception that it anticipates. Furthermore, if some sort of error occurs other than the one that you expected to handle, then this form of try … except will catch it, which may lead to behavior that is surprising and difficult to debug.
In some situations we may wish to ensure that a resource is closed or some other action will always be taken, even if an error occurs in our program. In these situations we may use the 'try' statement with a 'finally' clause. The code in the 'finally' clause will always run, even if the code in the 'try' block raises an exception.
For example, suppose that we have a function calculate() that performs some long calculation. Here is a function that calls calculate() 100 times and writes the results to a file:
def write_file(): f = open('data', 'w') for i in range(100): f.write(f'i: {calculate(i)}\n') f.close()
If an exception is raised inside one of the calls to calculate(), then the file will not be closed and data previously written may be lost. Instad, let's use try .. finally:
def write_file(): f = open('data', 'w') try: for i in range(100): f.write(f'i: {calculate(i)}\n') finally: f.close()
Now the file will be closed even if an error occurs.
The preceding situation, in which we want to close a file even if an error occurs, is so common that Python has a special statement for it. The 'with' statement assigns a file object (or other resource) to a variable, then runs a block of code. When the block of code exits for any reason, the object is automatically closed, just as if you had called close() on the object.
Let's rewrite the previous function using 'with':
def write_file(): with open('data', 'w') as f: for i in range(100): f.write(f'i: {calculate(i)}\n')
It's good practice to use 'with' whenever you open a file, to ensure that the file will be closed even if the program exits with an error.
In this course we have already seen several of Python's magic methods: __init__, __repr__, plus operator overloading methods such as __add__ and __sub__.
Let's revisit the Vec class for representing vectors, which we saw in an earlier lecture:
class Vec: def __init__(self, *a): self.a = a def __add__(self, w): assert len(self.a) == len(w.a) b = [] for i in range(len(self.a)): b.append(self.a[i] + w.a[i]) return Vec(*b) # Generate a string representation such as [3 5 10]. def __repr__(self): w = [] for x in self.a: w.append(str(x)) return '[' + ' '.join(w) + ']'
As a reminder, the class works like this:
>>> v = Vec(2, 4, 6) >>> w = Vec(10, 20, 30) >>> v + w [12 24 36]
Now suppose that we create two Vec objects with the same coordinates. Are they equal?
>>> v = Vec(2, 4, 6) >>> w = Vec(2, 4, 6) >>> v == w False
Python does not consider them to be equal. By default, two instances of a user-defined class are equal only if they are the same object, i.e. the 'is' operator returns True when applied to the objects.
Now, we may wish to change this. Two vectors are mathematically equal if they have the same coordinates, so in that case it would make sense for them to be equal according to Python's == operator. Python includes a magic method __eq__ that we may use to define equality on any class we like. Let's add an implementation of __eq__ to the Vec class:
# in class Vec def __eq__(self, w): return self.a == w.a
With this method in place, v and w will be equal:
>>> v = Vec(2, 4, 6) >>> w = Vec(2, 4, 6) >>> v == w True
Vec is an immutable class, so we might like to use it as a dictionary key. Let's attempt to create a dictionary that maps vectors to integers:
>>> v = Vec(2, 4, 6) >>> x = Vec(10, 20, 30) >>> d = {v: 100, x: 200} Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'Vec'
Python won't let us.
Python implements a dictionary as a hash table, a data structure that we recently studied in Introduction to Algorithms. The problem here is that we have redefined equality on the Vec class, but now Python doesn't know how to compute a hash function for Vec objects. Suppose that v == w. Then d[v] should be the same as d[w], since v and w are mathematically equal. In other for that to work, v and w must have the same hash vaue. More generally speaking, if two objects are equal using ==, then they must have the same hash value.
And so if we implement the __eq__ magic method on a class, then we must also implement another magic method called __hash__ if we wish to use instances of our class as hash table keys. __hash__ returns a hash code for an object; it is automatically invoked by Python's hash() function, which Python also uses in its dictionary implementation.
Let's add an implementation of __hash__ to the Vec class:
# in class Vec def __hash__(self): return hash(self.a)
Now we can use Vec objects as dictionary keys:
>>> v = Vec(2, 4, 6) >>> w = Vec(2, 4, 6) >>> x = Vec(10, 20, 30) >>> d = {v: 100, x: 200} >>> d[v] 100 >>> d[w] 100 >>> d[x] 200