Week 7: Notes

debugging

Most programmers spend a fair amount of time debugging. Debugging is something like being a detective, trying to solve the mystery of why a program doesn't behave the way you think it should. It can require a lot of patience, but ultimately it's satisfying when you finally figure out why a program is misbehaving, just like the end of a mystery novel or film. :)

A basic tool for debugging is inserting print statements in your code to reveal selected elements of a program's state as it runs.

For some debugging problems, a debugger is a valuable tool. Debuggers are available for all major languages including Python.

Specifically, Visual Studio Code includes a nice interface to a Python debugger. When you have a Python source file open in the editor, look for the triangle icon in the upper right. To start debugging, open the dropdown menu to the right of the triangle and choose "Debug Python File". Your program will start running under the debugger. If an exception occurs, execution will stop and you'll be able to examine the values of local and global variables in the 'Variables' pane in the upper left.

Even before you start running your program under the debugger, you will probably want to create one or more breakpoints. To create a breakpoint on a line, either click anywhere in the line and press F9, or click in the editor margin to the left of the line. A red dot will appear to the left of the line, indicating that there is a breakpoint there. When execution reaches any line with a breakpoint, it will stop. After that, you can step through your program's execution line by line using the Step Into (F11) and Step Over (F10) commands. If the execution point is currently at a function call, Step Into will travel into the function call and stop on the first line of the function. By contrast, Step Over will execute the function without stopping and will then stop on the first line after the function call.

Most debuggers (even for other programming languages) have similar commands and even keyboard shortcuts, so if you become familiar with Python's debugger you should be able to switch to other debuggers easily.

Here is a buggy insertion sort function that we debugged in the lecture:

# Insertion sort, with bugs!
def insertion_sort(a):
    n = len(a)
    for i in range(1, n - 1):
        t = a[i]                # lift up a[i]
        j = i
        while j >= 0 and a[j - 1] > t:
            a[j] = a[j - 1]     # shift element over
            j -= 1
        a[j] = t                # put [a] in its place

As an exercise, you may wish to debug it again.

Objects and classes

In Python and many other object-oriented languages, we may define our own data types, which are called classes. After we define a class, we may create instances of the class, which are called objects. An object has a set of attributes, which are data that belongs to the object. A class defines methods which may run on instances of the class.

In fact, the built-in types we've already seen (such as int, float, and bool) are actually classes. We can see this if we call the built-in type() function, which returns a value's type:

>>> type(3)
<class 'int'>
>>> type(5.0)
<class 'float'>

And values of those types (such as 3, 5.0, and True) are actually objects. In Python all values are objects (though that is not actually true in some other languages such as C++).

In this course, you can think of a type and a class as being the same thing. (There is actually a slight technical difference between these, but it is not important for our purposes now. We'll return to this topic in Programming 2.).

As a first example of writing a class, let's create a class Point. This is the smallest possible class definition in Python:

class Point:
    pass

In Python, the 'pass' statement does nothing. It's needed here, since a class definition may not be completely empty.

We may now create objects which are instances of class Point, and assign attributes to them:

>>> p = Point()
>>> p.x = 3
>>> p.y = 4
>>> q = Point()
>>> q.x = 10
>>> q.y = 20
>>> p.x
3
>>> q.y
20

Above, the function Point() is a constructor function that we can call to create a Point object.

However, typically when we write any class we write an initializer method that takes the arguments (if any) that are passed to the constructor, and uses them to initialize the object, typically by creating attributes. For example:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

In Python, the name __init__ is special: it means that this method is an initializer, and will run automatically when a new instance of this class is created.

Every method has a parameter list that begins with a parameter that receives the object on which the method was invoked. Traditionally in Python this parameter is called 'self' (though actually it may have any name). In an initializer method, this parameter receives the object that is being created.

Let's create a couple of Point objects using this new initializer:

>>> p = Point(3, 4)
>>> q = Point(10, 20)
>>> p.x
3
>>> q.y
20

Notice in this example that when we call the constructor, we pass only two arguments, but the parameter list in __init__ has three parameters. That's because 'self' is an extra parameter that is passed automatically.

Let's add a few methods to the Point class. It will now look like this:

import math

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    # Return the distance from this point to the origin.
    def from_origin(self):
        return math.sqrt(self.x ** 2 + self.y ** 2)

    # Return true if this point is the origin (0, 0).
    def is_origin(self):
        d = self.from_origin()
        return d == 0

    # Return the distance between this point and another point q.
    def distance(self, q):
        return math.sqrt((self.x - q.x) ** 2 + (self.y - q.y) ** 2)

Notice that

Here is how we can use our class:

>>> p = Point(3, 4)
>>> p.from_origin()
5.0
>>> p.is_origin()
False
>>> q = Point(10, 12)
>>> p.distance(q)
10.63014581273465
>>> q.distance(p)
10.63014581273465

Here's another class that we wrote in the lecture:

class Line:
    def __init__(self, p, q):     # create a line from p to q
        self.p = p
        self.q = q

    def length(self):
        return self.p.distance(self.q)

Let's try it:

>>> p = Point(3, 4)
>>> q = Point(10, 12)
>>> l = Line(p, q)
>>> l.length()
10.63014581273465

Notice that the __init__ method in the Point and Line classes above simply puts initializer arguments into attributes with the same name. That is a common behavior for simple classes such as these. However, an __init__ method can do anything you like! For example, it may transform the initializer arguments in some way or create attributes with different names. We'll see examples of such initializers in classes that we'll write later (including in our algorithms course).

The __repr__ method

Above, we learned about __init__, which is a method with a special name that Python recognizes and which affects an object's behavior in a certain way. Python actually recognizes many different special method names, all of which begin and end with two underscores. Methods with these names are often called magic methods.

Another magic method in Python is called __repr__. If this method is defined in a class, Python calls it automatically whenever it needs to generate a string representation of an object. For example, this happens when you print out an object in the interactive Python console. By default, the string representation is just a blob of text with an ugly hexadecimal number:

>>> p = Point(3, 4)
>>> q = Point(10, 20)
>>> l = Line(p, q)
>>> p
<__main__.Point object at 0x7f173b1aa8b0>
>>> l
<__main__.Line object at 0x7f173b1aa7c0>

Let's add __repr__ methods to the Point and Line classes to define a nicer string representation for these classes:

class Point:
    ...
    def __repr__(self):
        return f'P({self.x}, {self.y})'

class Line
    ...
    def __repr__(self):
        return f'{self.p} – {self.q}'

Now Point and Line objects will print more nicely:

>>> p = Point(3, 4)
>>> q = Point(10, 20)
>>> l = Line(p, q)
>>> p
P(3, 4)
>>> l
P(3, 4) - P(10, 20)

Notice that in our __repr__ method in the Line class, we wrote 'self.p' and 'self.q' in curly braces in an f-string. In this situation, Python needs to convert self.p and self.q to strings, so it will call the __repr__ method of the Point class to perform that task.

Vector class

As a further example, let's write a class Vec that can represent a vector of arbitrary dimension. We'll include a __repr__ method so that vectors print out nicely:

class Vec:
    def __init__(self, *args):
        self.a = args

    # Generate a string representation such as [3 5 10].
    def __repr__(self):
        w = []
        for x in self.a:
            w.append(str(x))
        return '[' + ' '.join(w) + ']'

We've already seen that a parameter such as "*args" allows a function or method to accept an arbitrary number of arguments, which are gathered into a single tuple. So our initializer sets the attribute 'a' to hold a tuple of values in the vector:

>>> v = Vec(2.0, 4.0, 5.0)
>>> v.a
(2.0, 4.0, 5.0)
>>> v
[2.0 4.0 5.0]

Let's now add a method length() for computing a vector's length, plus a method add() for adding two vectors of the same dimension:

def length(self):
    s = 0
    for x in self.a:
        s += x * x
    return math.sqrt(s)

def add(self, w):
    assert len(self.a) == len(w.a), 'vectors must have same dimension'
    sum = []
        
    for i in range(len(self.a)):
        sum.append(self.a[i] + w.a[i])
        
    return Vec(*sum)

The add() method needs to return a vector, so it calls the Vec() constructor to make one. In this call, it uses the '*' operator to explode the values from 'sum' into separate arguments, because the initializer function expects each coordinate to be a separate argument. (The initializer will gather all these arguments back into a tuple.)

Now we can add Vector objects:

$ py -i vector.py 
>>> v = Vec(2.0, 4.0, 5.0)
>>> w = Vec(1.0, 2.0, 3.0)
>>> z = v.add(w)
>>> z
[3.0 6.0 8.0]