Week 6: Notes

Default parameter values

When we define a function in Python, we can specify default parameter values that are used if the caller doesn't specify values for these parameters. For example:

# Split a string into two pieces.
def chop(s, frac = 0.5):
    n = int(len(s) * frac)
    return s[:n], s[n:]

>>> chop('watermelon', 0.3)
('wat', 'ermelon')
>>> chop('watermelon')
('water', 'melon')

In the second call above, we didn't specify a value for frac, so the function used the default value of 0.5.

In a function declaration, parameters with default values must appear at the end of the parameter list.

Many Python functions in the standard library have parameters with default values. For example, the pop(i) method removes a value at a given index i in a list, and returns the value. If the parameter 'i' is not specified, it defaults to -1, meaning that it will remove the value at the end of the list:

>>> l = [22, 44, 66, 88, 110]
>>> l.pop(3)
88
>>> l
[22, 44, 66, 110]
>>> l.pop()
110

You should be aware of one subtle danger in declaring default parameter values. When Python sees a function declaration with default values, it evaluates each of those values to an object which is reused on all invocations of the function. That leads to this surprising behavior:

def add(x, y, l = []):    # l defaults to an empty list
    l.append(x)
    l.append(y)
    return l

>>> add(3, 5, [7, 8])
[7, 8, 3, 5]
>>> add(3, 5)
[3, 5]
>>> add(10, 11)
[3, 5, 10, 11]         # unexpected: 3 and 5 are present in the list!

You can avoid that behavior by using an immutable value such as None as the default:

def add(x, y, l = None):
    if l == None:
        l = []
    l.append(x)
    l.append(y)
    return l

Now the function behaves as you might expect:

>>> add(3, 5)
[3, 5]
>>> add(10, 11)
[10, 11]

Keyword arguments

When you call any function in Python, you may optionally specify parameter names when you provide arguments. An argument with a name is called a keyword argument.

For example, consider this function:

def digit_sum(x, y, z):
    return 100 * x + 10 * y + z

We may call it in any of the following ways:

>>> digit_sum(3, 4, 5)
345
>>> digit_sum(x = 3, y = 4, z = 5)
345
>>> digit_sum(z = 5, y = 4, x = 3)
345
>>> digit_sum(3, y = 4, z = 5)
345
>>> digit_sum(3, z = 5, y = 4)
345

Notice that keyword arguments may appear in any order in a function call. However, they must appear after any arguments without names:

>>> digit_sum(y = 4, z = 5, 3)
  File "<stdin>", line 1
    digit_sum(y = 4, z = 5, 3)
                            ^
SyntaxError: positional argument follows keyword argument

If a function's parameters have default values, when we call the function we may want to provide arguments for only some parameters. We can use names to indicate which argument(s) we are providing:

def digit_sum2(x = 8, y = 8, z = 8):
    return 100 * x + 10 * y + z

>>> digit_sum2(y = 2)
828

More generally, sometime it's good practice to specify parameter names in a function call to make it more readable, so that the meaning of each argument is clear.

Assertions

In Python, the assert statement checks that a given condition is true. If it is false, the program will fail with an AssertionError. For example:

assert 0.0 <= prob <= 1.0

You may add an optional string which will be printed if the assertion fails:

assert 0.0 <= prob <= 1.0, 'probability must be between 0.0 and 1.0'

You can use assertions to verify conditions that should always be true unless there is a bug in the code. If such a condition is false, it is best to find out about it right away, rather than continuing to run and producing an incorrect result later, which may be difficult to debug.

You may also wish to use assertions to verify that arguments passed to a function are valid. For example:

# Compute the average value of numbers in list a

def avg(a):
    assert len(a) > 0, 'list must be non-empty'
    return sum(a) / len(a)

Objects and classes

In Python and many other object-oriented languages, we may define our own data types, which are called classes. After we define a class, we may create instances of the class, which are called objects. An object has a set of attributes, which are data that belongs to the object. A class defines methods which may run on instances of the class.

In fact, the built-in types we've already seen (such as int, float, and bool) are actually classes. We can see this if we call the built-in type() function, which returns a value's type:

>>> type(3)
<class 'int'>
>>> type(5.0)
<class 'float'>

And values of those types (such as 3, 5.0, and True) are actually objects. In Python all values are objects (though that is not actually true in some other languages such as C++).

In this course, you can think of a type and a class as being the same thing. (There is actually a slight technical difference between these, but it is not important for our purposes now. We'll return to this topic in Programming 2.).

As a first example of writing a class, let's create a class Point. This is the smallest possible class definition in Python:

class Point:
    pass

In Python, the 'pass' statement does nothing. It's needed here, since a class definition may not be completely empty.

We may now create objects which are instances of class Point, and assign attributes to them:

>>> p = Point()
>>> p.x = 3
>>> p.y = 4
>>> q = Point()
>>> q.x = 10
>>> q.y = 20
>>> p.x
3
>>> q.y
20

Above, the function Point() is a constructor function that we can call to create a Point object.

However, typically when we write any class we write an initializer method that takes the arguments (if any) that are passed to the constructor, and uses them to initialize the object, typically by creating attributes. For example:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

In Python, the name __init__ is special: it means that this method is an initializer, and will run automatically when a new instance of this class is created.

Every method has a parameter list that begins with a parameter that receives the object on which the method was invoked. Traditionally in Python this parameter is called 'self' (though actually it may have any name). In an initializer method, this parameter receives the object that is being created.

Let's create a couple of Point objects using this new initializer:

>>> p = Point(3, 4)
>>> q = Point(10, 20)
>>> p.x
3
>>> q.y
20

Notice in this example that when we call the constructor, we pass only two arguments, but the parameter list in __init__ has three parameters. That's because 'self' is an extra parameter that is passed automatically.

Let's add a few methods to the Point class. It will now look like this:

import math

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    # Return the distance from this point to the origin.
    def from_origin(self):
        return math.sqrt(self.x ** 2 + self.y ** 2)

    # Return true if this point is the origin (0, 0).
    def is_origin(self):
        d = self.from_origin()
        return d == 0

    # Return the distance between this point and another point q.
    def distance(self, q):
        return math.sqrt((self.x - q.x) ** 2 + (self.y - q.y) ** 2)

Notice that

from_origin() accesses the attributes 'x' and 'y' of the object 'self'
is_origin() calls the from_origin() method on the same object that it was invoked on
distance() accesses the attributes 'x' and 'y' of 'self', and also those same attributes of another Point object 'q'

Here is how we can use our class:

>>> p = Point(3, 4)
>>> p.from_origin()
5.0
>>> p.is_origin()
False
>>> q = Point(10, 12)
>>> p.distance(q)
10.63014581273465
>>> q.distance(p)
10.63014581273465

Here's another class that we wrote in the lecture:

class Line:
    def __init__(self, p, q):     # create a line from p to q
        self.p = p
        self.q = q

    def length(self):
        return self.p.distance(self.q)

Let's try it:

>>> p = Point(3, 4)
>>> q = Point(10, 12)
>>> l = Line(p, q)
>>> l.length()
10.63014581273465

Notice that the __init__ method in the Point and Line classes above simply puts initializer arguments into attributes with the same name. That is a common behavior for simple classes such as these. However, an __init__ method can do anything you like! For example, it may transform the initializer arguments in some way or create attributes with different names. We'll see examples of such initializers in classes that we'll write later (including in our algorithms course).

The repr method

Above, we learned about __init__, which is a method with a special name that Python recognizes and which affects an object's behavior in a certain way. Python actually recognizes many different special method names, all of which begin and end with two underscores. Methods with these names are often called magic methods.

Another magic method in Python is called __repr__. If this method is defined in a class, Python calls it automatically whenever it needs to generate a string representation of an object. For example, this happens when you print out an object in the interactive Python console. By default, the string representation is just a blob of text with an ugly hexadecimal number:

>>> p = Point(3, 4)
>>> q = Point(10, 20)
>>> l = Line(p, q)
>>> p
<__main__.Point object at 0x7f173b1aa8b0>
>>> l
<__main__.Line object at 0x7f173b1aa7c0>

Let's add __repr__ methods to the Point and Line classes to define a nicer string representation for these classes:

class Point:
    ...
    def __repr__(self):
        return f'P({self.x}, {self.y})'

class Line
    ...
    def __repr__(self):
        return f'{self.p} – {self.q}'

Now Point and Line objects will print more nicely:

>>> p = Point(3, 4)
>>> q = Point(10, 20)
>>> l = Line(p, q)
>>> p
P(3, 4)
>>> l
P(3, 4) - P(10, 20)

Notice that in our __repr__ method in the Line class, we wrote 'self.p' and 'self.q' in curly braces in an f-string. In this situation, Python needs to convert self.p and self.q to strings, so it will call the __repr__ method of the Point class to perform that task.

Vector class

As a further example, let's write a class Vec that can represent a vector of arbitrary dimension. We'll include a __repr__ method so that vectors print out nicely:

class Vec:
    def __init__(self, *args):
        self.a = args

    # Generate a string representation such as [3 5 10].
    def __repr__(self):
        w = []
        for x in self.a:
            w.append(str(x))
        return '[' + ' '.join(w) + ']'

We've already seen that a parameter such as "*args" allows a function or method to accept an arbitrary number of arguments, which are gathered into a single tuple. So our initializer sets the attribute 'a' to hold a tuple of values in the vector:

>>> v = Vec(2.0, 4.0, 5.0)
>>> v.a
(2.0, 4.0, 5.0)
>>> v
[2.0 4.0 5.0]

Let's now add a method length() for computing a vector's length, plus a method add() for adding two vectors of the same dimension:

def length(self):
    s = 0
    for x in self.a:
        s += x * x
    return math.sqrt(s)

def add(self, w):
    assert len(self.a) == len(w.a), 'vectors must have same dimension'
    sum = []
        
    for i in range(len(self.a)):
        sum.append(self.a[i] + w.a[i])
        
    return Vec(*sum)

The add() method needs to return a vector, so it calls the Vec() constructor to make one. In this call, it uses the '*' operator to explode the values from 'sum' into separate arguments, because the initializer function expects each coordinate to be a separate argument. (The initializer will gather all these arguments back into a tuple.)

Now we can add Vector objects:

$ py -i vector.py 
>>> v = Vec(2.0, 4.0, 5.0)
>>> w = Vec(1.0, 2.0, 3.0)
>>> z = v.add(w)
>>> z
[3.0 6.0 8.0]

modules

A module is a collection of definitions that Python code can import and use. We've been using modules in Python's standard library for weeks now. For example, the line

import math

lets us use functions in the math module, which is built into the standard library. This statement loads the module into memory and actually makes a variable called 'math' that points to it. Like everything else in Python, a module is actually an object:

>>> import math
>>> math
<module 'math' (built-in)>

After importing a module, we can access any name defined by the module by prefixing it with the module name:

>>> math.sin(0)
0.0

Here is a brief overview of some other ways to import (some of which we have seen before). We may wish to import some of the module's names directly into our namepace, so that we can access them without a prefix. We can do that using a 'from…import' statement:

>>> from math import sin, cos
>>> sin(0) + cos(0)
1.0

We can import all of a module's names using the '*' wildcard character:

>>> from math import *
>>> log(1) + sqrt(1)
1.0

The 'import...as' statement will import a module using an alternative name of our choice:

>>> import math as m
>>> m.ceil(4.5) + m.floor(4.5)
9

We may also specify an alternate name with importing a function from a module:

>>> from math import sin as sn
>>> sn(0)
0.0

In some larger libraries, modules may have submodules. For example, we've already seen the submodule matplotlib.pyplot, which we imported like this:

>>> import matplotlib.pyplot as plt

packages

A package is a special kind of module that can contain both top-level definitions and other modules. Packages are also a unit of software distribution. In other words, it's possible (and fairly easy) to write a package of Python code and then make it available for others to install and use on their systems.

To this end, Python includes a package manager called 'pip' that can install and remove packages on your system. As a first experiment, you can run 'pip list' to see a list of packages that are currently installed. (On macOS, you will need to run 'pip3' instead of 'pip'.) When I run this command, I see output that begins like this:

$ pip list
Package                         Version
------------------------------- ---------------
appdirs                         1.4.4
attrs                           22.1.0
banking.statements.nordea       1.3.0
banking.statements.osuuspankki  1.3.4.dev0
bcrypt                          3.2.0
…

I did not explicitly install most of these packages; instead, they were installed by various Python-based programs on my system.

We can easily install additional packages. pip finds packages to install in an enormous repository called PyPI (the Python Package Index), which currently lists over 490,000 packages contributed by thousands of users. For example, the sty package lets a program print colored output to the terminal. Let's install it:

$ pip install sty
Defaulting to user installation because normal site-packages is not writeable
Collecting sty
  Using cached sty-1.0.4-py3-none-any.whl (11 kB)
Installing collected packages: sty
Successfully installed sty-1.0.4
$

Now we can produce colored output. For example:

from sty import fg

def out():
    print(f'This text is {fg.green}green{fg.rs} and this text is {fg.red}red{fg.rs}.')

(Note that the blue coloring in the code above is the same syntax coloring that you see in all Python code in these notes. It has nothing to do with colored output!)

Let's run the function above:

>>> out()
This text is green and this text is red.

Note that if you want sty's colored output to work on Windows, you'll need to add an extra hack to your code as described on sty's web site:

import os

os.system('')

(It's unfortunate that this is necessary, and I hope that a future release of sty will work automatically on Windows.)