Week 6: Notes

None

None is a special value in Python that represents nothingness. It is often useful to represent the absence of a value.

For example, here's a program that computes the maximum of all numbers read from standard input. It keeps the maximum in a variable 'mx', which is initialized to None before any numbers are read:

import sys

mx = None

for line in sys.stdin:
    x = int(line)
    if mx == None or x > mx:
        mx = x

print(f'max = {mx}')

Be aware that the Python interactive intepreter prints nothing at all when you give it an expression whose value is None:

>>> x = None
>>> x
>>>

(None is similar to the special value null in some other languages such as Java and C#.)

If a function does not return a value explicitly, then it will return None. Also, if a return statement does not contain a value, it will return None. For example, consider this function:

def prime_check(n):
    for i in range(2, n):
        if n % i == 0:
            print('composite')
            return
        
    print('prime')

Let's call it:

>>> prime_check(14)
composite
>>> prime_check(23)
prime

In both cases the function returned None, and so the interactive interpreter didn't print any return value for the function.

local and global variables

Consider this Python program:

x = 7

def abc(a):
    i = a + x
    return i

def ha():
    i = 4
    print(abc(2))
    print(x + i)
ha()

print(x)

The variable x declared at the top is a global variable. Its value is visible everywhere: both inside the functions abc() and ha(), and also in the statement print(x) at the end of the program.

The variables i declared inside abc() and ha() are local variables. They are different variables: when the line "i = a + x" executes inside abc(), that does not change the value of i in ha().

Local variables are a fundamental feature of every modern programming language. Because a local variable's scope (the area of the program where it is visible) is small, it is easy to understand how the variable will behave. In larger programs, I recommend making variables local whenever possible.

Now consider this program, with two global variables at the top:

a = 10
b = 20

def abc(n):
    a = n
    return a + b

Let's try it:

>>> abc(50)
70
>>> a
10
>>>

We can see that the assignment "a = n" above did not set the value of the global a. Instead, it set a local variable called a. However, the statement "return a + b" did read the value of the global variable b.

What if we want the function abc() to set the global a? To achieve that, we can add a global declaration to the function:

def abc(n):
    global a
    a = n
    return a + b

Now calling abc() will set a:

>>> abc(100)
120
>>> a
100

We see that a function can read the value of a global variable without declaring it as global. But if a function wants to assign to a global variable, it must declare the variable as global.

To be more precise, here is how Python determines whether each variable in a function is local or global:

If there is a global decaration, the variable is global.
Otherwise, if the function ever assigns to the variable, it is considered local. If not, it is considered global (and must be defined somewhere outside the function).
Note that it is impossible to write a function that uses both a local variable "x" and also a global variable "x". Each variable name such as "x" is either always local or always global within a single function body.

When you write a larger program, it's best to have few global variables whose values may change, or even none. They can make a program hard to understand, since any code in the entire program could possibly modify them.

On the other hand, globals that you only read from are fine - essentially they just represent constant data. In Python, by convention we usually capitalize the names of such globals to emphasize that they are constant. For example:

SECONDS_PER_DAY = 24 * 3600

passing by value and reference

Consider this function, which adds all values in a list (or other iterable):

def sum_of(a):
    s = 0
    for x in a:
        s += x
    return s

Let's call it with a list of 1,000,000 elements:

>>> l = 1_000_000 * [5]
>>> sum_of(l)
5000000

When we called the function, did it copy the list when it passed it to the function? Actually no. The function only received a pointer to the list. To see this more easily, let's write a function that modifies a list element:

def inc(a):
    a[0] += 1

And let's call it:

>>> l = [2, 4, 6]
>>> inc(l)
>>> l
[3, 4, 6]

We see that Python passes lists by reference. When the program calls inc(l), then as the function runs l and a are the same list. If we modify the list in the function, the change is visible in the caller.

The opposite of passing by reference is passing by value, in which the function receives a copy of the data that is passed. In some languages (e.g. C++), an array may be passed by value.

For a large data structure such as an array with 1,000,000 elements, passing by reference will be much more efficient than passing by value. And in fact Python passes references when you pass any value to a function (except for a few small values such as None, True and False, though you can't easily tell the difference since these are immutable).

Now, strictly speaking, Python actually passes references to objects by value. This means that when you pass an object to a function, the function parameter receives a reference to the object, however it is a separate reference than the one in the caller. For example:

def foo(x):
    x = 'hello'

>>> y = 'yo'
>>> foo(y)
>>> y
'yo'

def bar(a):
    a[0] = 100
    a = [2, 4, 6]

>>> b = [0, 0, 0]
>>> bar(b)
>>> b
[100, 0, 0]

When we call the function foo(), x refers to the same string as y (the string is not copied). However, assigning to x does not change y, because x is a separate reference.

Similarly, when we call the function bar(), a refers to the same array as b (the array is not copied). If we modify that array, the change is visible in b. However, assigning to a does not change b, because a is a separate reference.

Many other languages such as Java, C#, and JavaScript use the same calling mechanism.

variable numbers of arguments

We may sometimes wish to write a function that can take a variable number of arguments. For example, we may wish to write a function that returns the average of all its arguments, no matter how many there are:

>>> avg(1.0, 3.0, 8.0)
4.0
>>> avg(2.0, 4.0, 6.0, 8.0, 10.0)
6.0

We may write this function using a parameter preceded by the character '*', which means that the parameter should gather all of its arguments into a tuple:

def avg(*args):
    return sum(args) / len(args)

When we make the call 'avg(1.0, 3.0, 8.0)', inside the function the variable 'args' has the value (1.0, 3.0, 8.0), which is a tuple of 3 values. (Recall that the sum() and len() functions work on tuples just as they do on lists and other sequences).

A parameter preceded by '*' can have any name, but the name 'args' is conventional in Python.

When we call a function we may sometimes have a list (or other sequence) that contains the arguments we'd like to pass. In this case, we can specify '*' before an argument to specify that Python should explode its values into separate arguments. For example, consider this function:

def add(x, y, z, q):
    return x + y + z + q + 10

When we call it, we may specify its arguments individually, or by exploding them from a list:

>>> add(10, 20, 30, 40)
110
>>> l = [10, 20, 30, 40]
>>> add(*l)
110

More commonly, this situation occurs when a function accepts a variable number of arguments, and we want to pass arguments from a list. For example, calling the avg() function we defined above:

>>> l = [3.0, 4.0]
>>> avg(*l)
3.5

default parameter values

When we define a function in Python, we can specify default parameter values that are used if the caller doesn't specify values for these parameters. For example:

# Split a string into two pieces.
def chop(s, frac = 0.5):
    n = int(len(s) * frac)
    return s[:n], s[n:]

>>> chop('watermelon', 0.3)
('wat', 'ermelon')
>>> chop('watermelon')
('water', 'melon')

In the second call above, we didn't specify a value for frac, so the function used the default value of 0.5.

In a function declaration, parameters with default values must appear at the end of the parameter list.

Many Python functions in the standard library have parameters with default values. For example, the pop(i) method removes a value at a given index i in a list, and returns the value. If the parameter 'i' is not specified, it defaults to -1, meaning that it will remove the value at the end of the list:

>>> l = [22, 44, 66, 88, 110]
>>> l.pop(3)
88
>>> l
[22, 44, 66, 110]
>>> l.pop()
110

You should be aware of one subtle danger in declaring default parameter values. When Python sees a function declaration with default values, it evaluates each of those values to an object which is reused on all invocations of the function. That leads to this surprising behavior:

def add(x, y, l = []):    # l defaults to an empty list
    l.append(x)
    l.append(y)
    return l

>>> add(3, 5, [7, 8])
[7, 8, 3, 5]
>>> add(3, 5)
[3, 5]
>>> add(10, 11)
[3, 5, 10, 11]         # unexpected: 3 and 5 are present in the list!

You can avoid that behavior by using an immutable value such as None as the default:

def add(x, y, l = None):
    if l == None:
        l = []
    l.append(x)
    l.append(y)
    return l

Now the function behaves as you might expect:

>>> add(3, 5)
[3, 5]
>>> add(10, 11)
[10, 11]

keyword arguments

When you call any function in Python, you may optionally specify parameter names when you provide arguments. An argument with a name is called a keyword argument.

For example, consider this function:

def digit_sum(x, y, z):
    return 100 * x + 10 * y + z

We may call it in any of the following ways:

>>> digit_sum(3, 4, 5)
345
>>> digit_sum(x = 3, y = 4, z = 5)
345
>>> digit_sum(z = 5, y = 4, x = 3)
345
>>> digit_sum(3, z = 5, y = 4)
345

Notice that keyword arguments may appear in any order in a function call.

If a function's parameters have default values, when we call the function we may want to provide arguments for only some parameters. We can use names to indicate which argument(s) we are providing. For example:

def digit_sum2(x = 8, y = 8, z = 8):
    return 100 * x + 10 * y + z

>>> digit_sum2(y = 2)
828

You may sometimes want to specify parameter names in a function call even when they are not necessary, since they can make a function call more readable (especially if it has many arguments).

A function may have keyword-only arguments, which are listed after a * in a function declaration. For example:

def foo(x, *, y = 1, z = 0):
    return x * y + z

In this declaration y and z are keyword-only arguments. The caller must specify parameter names when passing these:

>>> foo(10)
10
>>> foo(10, 2, 3)
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    foo(10, 2, 3)
    ~~~^^^^^^^^^^
TypeError: foo() takes 1 positional argument but 3 were given
>>> foo(10, y = 2, z = 3)
23

Every keyword-only argument must always have a default value.

Many functions and methods in Python's standard library have keyword-only arguments. For example, the sort() method sorts a list. It looks like this in our Python quick reference:

        l.sort(*, reverse = False, key = None)

We see that reverse and key are keyword-only arguments. Let's not worry about key for now (we'll see it in a later lecture). However, reverse will be useful for us even now: it tells Python to sort the list in reverse order. Because it's a keyword-only argument, we must specify it by name when calling this method:

>>> l = [5, 2, 8, 1, 3]

>>> l.sort(True)
Traceback (most recent call last):
  File "<python-input-6>", line 1, in <module>
    l.sort(True)
    ~~~~~~^^^^^^
TypeError: sort() takes no positional arguments

>>> l.sort(reverse = True)
>>> l
[8, 5, 3, 2, 1]

objects and classes

In Python and many other object-oriented languages, we may define our own data types, which are called classes. After we define a class, we may create instances of the class, which are called objects. An object has a set of attributes, which are data that belongs to the object. A class defines methods which may run on instances of the class.

In fact, the built-in types we've already seen (such as int, float, and bool) are actually classes. We can see this if we call the built-in type() function, which returns a value's type:

>>> type(3)
<class 'int'>
>>> type(5.0)
<class 'float'>

And values of those types (such as 3, 5.0, and True) are actually objects. In Python all values are objects (though that is not actually true in some other languages such as C++).

In this course, you can think of a type and a class as being the same thing. (There is actually a slight technical difference between these, but it is not important for our purposes now. We'll return to this topic in Programming 2.).

As a first example of writing a class, let's create a class Point. This is the smallest possible class definition in Python:

class Point:
    pass

In Python, the 'pass' statement does nothing. It's needed here, since a class definition may not be completely empty.

We may now create objects which are instances of class Point, and assign attributes to them:

>>> p = Point()
>>> p.x = 3
>>> p.y = 4
>>> q = Point()
>>> q.x = 10
>>> q.y = 20
>>> p.x
3
>>> q.y
20

Above, the function Point() is a constructor function that we can call to create a Point object.

However, typically when we write any class we write an initializer method that takes the arguments (if any) that are passed to the constructor, and uses them to initialize the object, typically by creating attributes. For example:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

In Python, the name __init__ is special: it means that this method is an initializer, and will run automatically when a new instance of this class is created.

Every method has a parameter list that begins with a parameter that receives the object on which the method was invoked. Traditionally in Python this parameter is called 'self' (though actually it may have any name). In an initializer method, this parameter receives the object that is being created.

Let's create a couple of Point objects using this new initializer:

>>> p = Point(3, 4)
>>> q = Point(10, 20)
>>> p.x
3
>>> q.y
20

Notice in this example that when we call the constructor, we pass only two arguments, but the parameter list in __init__ has three parameters. That's because 'self' is an extra parameter that is passed automatically.

Let's add a few methods to the Point class. It will now look like this:

import math

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    # Return the distance from this point to the origin.
    def from_origin(self):
        return math.sqrt(self.x ** 2 + self.y ** 2)

    # Return true if this point is the origin (0, 0).
    def is_origin(self):
        d = self.from_origin()
        return d == 0

    # Return the distance between this point and another point q.
    def distance(self, q):
        return math.sqrt((self.x - q.x) ** 2 + (self.y - q.y) ** 2)

Notice that

from_origin() accesses the attributes 'x' and 'y' of the object 'self'
is_origin() calls the from_origin() method on the same object that it was invoked on
distance() accesses the attributes 'x' and 'y' of 'self', and also those same attributes of another Point object 'q'

Here is how we can use our class:

>>> p = Point(3, 4)
>>> p.from_origin()
5.0
>>> p.is_origin()
False
>>> q = Point(10, 12)
>>> p.distance(q)
10.63014581273465
>>> q.distance(p)
10.63014581273465

Here's another class that we wrote in the lecture:

class Line:
    def __init__(self, p, q):     # create a line from p to q
        self.p = p
        self.q = q

    def length(self):
        return self.p.distance(self.q)

Let's try it:

>>> p = Point(3, 4)
>>> q = Point(10, 12)
>>> l = Line(p, q)
>>> l.length()
10.63014581273465

Notice that the __init__ method in the Point and Line classes above simply puts initializer arguments into attributes with the same name. That is a common behavior for simple classes such as these. However, an __init__ method can do anything you like! For example, it may transform the initializer arguments in some way or create attributes with different names. We'll see examples of such initializers in classes that we'll write later (including in our algorithms course).

Vector class

As a further example, let's write a class Vec that can represent a vector of arbitrary dimension.

class Vec:
    def __init__(self, *args):
        self.a = args

We've already seen that a parameter such as "*args" allows a function or method to accept an arbitrary number of arguments, which are gathered into a single tuple. So our initializer sets the attribute 'a' to hold a tuple of values in the vector:

>>> v = Vec(2.0, 4.0, 5.0)
>>> v.a
(2.0, 4.0, 5.0)

Let's now add a method length() for computing a vector's length, plus a method add() for adding two vectors of the same dimension:

def length(self):
    s = 0
    for x in self.a:
        s += x * x
    return math.sqrt(s)

def add(self, w):
    assert len(self.a) == len(w.a), 'vectors must have same dimension'
    sum = []
        
    for i in range(len(self.a)):
        sum.append(self.a[i] + w.a[i])
        
    return Vec(*sum)

The add() method needs to return a vector, so it calls the Vec() constructor to make one. In this call, it uses the '*' operator to explode the values from 'sum' into separate arguments, because the initializer function expects each coordinate to be a separate argument. (The initializer will gather all these arguments back into a tuple.)

Now we can add Vector objects:

$ py -i vector.py 
>>> v = Vec(2.0, 4.0, 5.0)
>>> w = Vec(1.0, 2.0, 3.0)
>>> z = v.add(w)
>>> z.a
[3.0 6.0 8.0]