Programming 1, 2022-3
Week 7: Notes

types, values, classes, objects

In Python we began by learning about various built-in types such as int, strings, and bools. Then, in the last lecture, we learned how to invent our own types by writing classes.

Actually a type and a class are pretty much the same thing. Every class that we write is a type, and in fact the built-in types are also classes, as you can see:

>>> int
<class 'int'>

(To be clear, there are technical distinctions between types and classes in statically typed languages, and also in Python with type annotations. However, we will not worry about those things at this point in this course.)

An object is an instance of a class. A value is anything that a variable can hold. In Python, these are the same thing: every value is an instance of some class. (In some other languages that is not the case.)

Usually we will use the terms 'value' and 'type' when we are talking about primitive types such as ints and bools, and the terms 'object' and 'class' when we are talking about user-defined classes.

class objects

In Python, every class is itself an object. For example, let's consider a simple class for representing points:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f'P({self.x}, {self.y})'

>>> Point
<class '__main__.Point'>

This is how Python prints out a class. It is an object like any other. For example, we can put it into a variable:

>>> xyz = Point
>>> xyz
<class '__main__.Point'>

As we have seen, we can construct an instance of a class by calling it as if it were a function, passing any intializer arguments that we like:

>>> Point(3, 4)
P(3, 4)

In fact we can even construct instances of primitive types in this way:

>>> int()
0
>>> bool()
False

determining an object's type

The built-in function type() returns an object's type, which will be a class object:

>>> type(14)
<class 'int'>
>>> type('hello')
<class 'str'>

Let's look at an instance of the Point class:

>>> p = Point(3, 4)
>>> p
P(3, 4)
>>> type(p)
<class '__main__.Point'>

Sometimes we may wish to test whether an object has a given type. One way to do that is to call the type() function, then compare with a class object:

>>> p = Point(10, 20)
>>> type(p) is Point
True

Alternatively, we can call the built-in function isinstance():

>>> isinstance(p, Point)
True

Generally it's best to use isinstance() in this situation, since it will work even in the presence of subclassing (which we will discuss in a future lecture).

sets

A set in Python is a useful built-in data structure that represents a mutable set of values. For example, here is a set that initially contains three integers:

>>> s = {3, 5, 10}

We may add a value to a set using the add() method. If the value is already present, the call does nothing:

>>> s.add(4)
>>> s
{10, 3, 4, 5}
>>> s.add(4)
>>> s
{10, 3, 4, 5}

The remove() method removes a value from a set. If it's not present, it will report an error:

>>> s.remove(5)
>>> s
{10, 3, 4}
>>> s.remove(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 5

The 'in' operator tests whether a value is present in a set:

>>> s = { 'red', 'green', 'yellow', 'turquoise' }
>>> 'yellow' in s
True
>>> 'purple' in s
False

All of these operations are efficient: add(), remove(), and 'in' will run in O(1) on average. For this reason, a set is sometimes a better choice of data structure than a list, especially in situations when we want to be able to quickly test whether an element is present.

However, be aware that sets are unordered. In other words, the values in a set are not in any particular order, and you cannot access values by index:

>>> s = { 'red', 'green', 'yellow', 'turquoise' }
>>> s[2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable

Python will print a set's values in an arbitrary order, unrelated to the order in which values were added:

>>> s = set()
>>> s.add(5)
>>> s.add(2)
>>> s.add(10)
>>> s
{2, 10, 5}

As you can see above, the function set() called with no arguments creates an empty set. (Do not attempt to create an empty set by writing '{}'. That is an empty dictionary, not a set. We will discuss dictionaries below.)

Sets are iterable, meaning that you can use a 'for' statement to loop over a set's values (in some arbitrary order):

>>> s = { 'red', 'green', 'yellow', 'turquoise' }
>>> for c in s:
...   print(c)
... 
green
turquoise
red
yellow

Note that every value in a set must be immutable. So you cannot create a set of lists, for example:

>>> s = { [1, 2, 3], [6, 5, 8], [10, 11, 12] }
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

However, a set of tuples will work fine, since tuples are immutable:

>>> s = { (1, 2, 3), (6, 5, 8), (10, 11, 12) }
>>> s
{(6, 5, 8), (1, 2, 3), (10, 11, 12)}
>>> 

Python provides handy operators for performing Boolean operations on sets. The '&', '|', and '-' operators compute the intersection, union, or difference of two sets:

>>> s = { 5, 8, 2, 10, 3, 6, 12 }
>>> t = { 8, 77, 3, 14, 12 }
>>> s & t
{8, 3, 12}
>>> s | t
{2, 3, 5, 6, 8, 10, 12, 77, 14}
>>> s - t
{2, 10, 5, 6}

The set operators 's <= t' and 's >= t' test whether s is a subset or superset of t, respectively:

>>> s = { 2, 4, 6 }
>>> t = { 1, 2, 3, 4, 5, 6, 8 }
>>> s <= t
True
>>> t <= s
False

For more information about built-in operators and functions that work on sets, see our quick library reference.

dictionaries

A dictionary is a built-in Python data type that represents a mutable map from keys to values. Dictionaries are useful in a wide variety of situations.

Here's a dictionary that maps the names of some animals in English to their Czech equivalents:

>>> d = {'cat' : 'kočka', 'dog' : 'pes', 'guinea pig' : 'morče'}

The keys in this dictionary are the strings 'cat', 'dog' and 'guinea pig'. The corresponding values are 'kočka', 'pes' and 'morče'. Notice that each key and its value are separated by a colon (':') in the declaration above.

We may look up the value for any key:

>>> d['dog']
'pes'
>>> d['guinea pig']
'morče'

We can use an assignment to add a new key/value pair to the dictionary:

>>> d['cow'] = 'kráva'
>>> d
{'cat': 'kočka', 'dog': 'pes', 'guinea pig': 'morče', 'cow': 'kráva'}

We may use the same syntax to replace the value at any key:

>>> d['cat'] = 'gato'
>>> d
{'cat': 'gato', 'dog': 'pes', 'guinea pig': 'morče', 'cow': 'kráva'}

The 'del' statement will delete a key and its value from a dictionary:

>>> del d['dog']
>>> d
{'cat': 'gato', 'guinea pig': 'morče', 'cow': 'kráva'}

The 'in' operator tests whether a key is present in a dictionary:

>>> 'guinea pig' in d
True
>>> 'horse' in d
False

All of these operations are efficient. Looking up a key's value, replacing a key's value, deleting a key, and testing for a key will all run in O(1) on average.

Notice that all keys in a dictionary are unique. Values, however, may be repeated. Also note that dictionaries are indexed by key, but not by value. In other words, you can quickly look up the value for any key, but if you want to know whether a particular value is present in the dictionary, you must scan the entire dictionary, which takes O(N).

Python provides methods that let us conveniently access the keys and values in a dictionary. First, d.keys() returns an iterable object containing all keys in the dictionary. You can iterate over this object using 'for', or convert it to a list:

>>> d = {'cat' : 'kočka', 'dog' : 'pes', 'guinea pig' : 'morče'}
>>> for k in d.keys():
        print(k)
cat
dog
guinea pig
>>> list(d.keys())
['cat', 'dog', 'guinea pig']

Similarly, d.values() lists all values in a dictionary. Another helpful method is d.items(), which is an iterable of key-value pairs:

>>> list(d.items())
[('cat', 'kočka'), ('dog', 'pes'), ('guinea pig', 'morče')]

Recall that in a 'for' statement we may use pattern matching to assign to multiple variables on each loop iteration:

>>> for (x, y) in [(1, 2), (3, 4), (5, 6)]:
         print(x + y)
3
7
11

And so we may conveniently use pattern matching to loop over the keys and values of a dictionary simultaneously:

>>> d
{'cat': 'kočka', 'dog': 'pes', 'guinea pig': 'morče'}
>>> for key, value in d.items():
        print(f'key = {key}, value = {value}')
key = cat, value = kočka
key = dog, value = pes
key = guinea pig, value = morče

Note that dictionary keys, like values in a set, must be immutable. So you can't create a dictionary whose keys are lists:

>>> d = { [1, 2, 3]: 7, [5, 3, 2]: 8 }
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

However, tuples will work fine as dictionary keys:

>>> d = { (1, 2, 3): 7, (5, 3, 2): 8 }

See our quick library reference for more information about built-in operators and functions that work on dictionaries.

counting words

As an example of the practical utility of dictionaries, suppose that we'd like to determine which words in a file appear the most times. For example, here is the entire text of the classic novel War and Peace. Let's write a program to read the text and print out its most common words. We'll only consider words with at least 5 letters, since shorter common words (e.g. 'the', 'a') are not usually so interesting.

We'll assume that the text is in a file 'war_and_peace'. We'll read the file and construct a dictionary that maps each word to its count. After that, we'll extract a list of pairs (count, word) from the dictionary. Then we'll sort the list. Python orders tuples lexicographically, so that will sort by count since the count appears first in each pair.

Here is the program:

f = open('war_and_peace')

count = {}

for line in f:
    words = line.strip().split()
    for w in words:
        if len(w) >= 5:
            if w in count:
                count[w] += 1
            else:
                count[w] = 1

# Extract a list of pairs.
pairs = []
for word, n in count.items():
    pairs.append( (n, word) )

pairs.sort(reverse = True)

for n, word in pairs[:10]:
    print(f'{word}: {n}')

magic methods for operator overloading

Python's magic methods let us specify custom behavior for our classes. We have already seen two magic methods in Python: __init__ for initializing an object, and __repr__ for converting an object to a string.

Additional magic methods let us customize the behavior of the +, -, *, /, and // operators:

More similar operators exist; you can read about them in the official Python reference. But the ones listed here are sufficient for our purposes for now.

By the way, providing custom behavior for operators such as these is called operator overloading, and is possible in many (but not all) programming languages.

To see how these may be used, let's write a class Vector that can represent a vector with an arbitrary number of elements. We'll include a __repr__ method so that vectors print out nicely:

class Vector:
    def __init__(self, *args):
        self.a = args

    # Generate a string representation such as [3 5 10].
    def __repr__(self):
        w = []
        for x in self.a:
            w.append(str(x))
        return '[' + ' '.join(w) + ']'

We've already seen that a parameter such as "*args" allows a function or method to accept an arbitrary number of arguments, which are gathered into a single tuple. So our initializer sets the attribute 'a' to hold a tuple of values in the vector:

>>> v = Vector(2.0, 4.0, 5.0)
>>> v.a
(2.0, 4.0, 5.0)

We will now implement a magic method __add__ that allows the operator + to combine instances of our Vector class:

def __add__(self, w):
    assert len(self.a) == len(w.a), 'vectors must have same dimension'
    sum = []
        
    for i in range(len(self.a)):
        sum.append(self.a[i] + w.a[i])
        
    return Vector(*sum)

Notice that when we call the Vector constructor, we must use the '*' operator to explode the values from 'sum' into separate arguments, because the initializer function expects each coordinate to be a separate argument. (The initializer will gather all these arguments back into a tuple.)

Now we can add Vector objects using +:

$ py -i vector.py 
>>> v = Vector(2.0, 4.0, 5.0)
>>> w = Vector(1.0, 2.0, 3.0)
>>> z = v + w
>>> z
[3.0 6.0 8.0]

Behind the scenes, the '+' operator just calls the magic method, and in fact we can call it directly if we like:

>>> v.__add__(w)
[3.0 6.0 8.0]

Let's now implement the '*' operator. Actually there are two kinds of multiplication on vectors. We can multiply two vectors to compute their dot product, producing a scalar. Or we can multiply a vector by a scalar to produce another vector. We will implement both of these in a single method:

def __mul__(self, x):
    if isinstance(x, Vec):

        # Compute the dot product.
        s = 0
        for i in range(len(self.a)):
            s += self.a[i] * x.a[i]
        return s
    else:   # scalar multiplication
        b = []
        for i in range(len(self.a)):
            b.append(x * self.a[i])
        return Vec(*b)

It works:

>>> v = Vec(2, 4, 5)
>>> v * v
45
>>> v * 10
[20 40 50]

Note, however, that with our existing code we cannot perform the scalar multiplication in the other direction:

>>> 10 * v
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for *: 'int' and 'Vec'

That's because Python will only invoke the __mul__ magic method (and similar methods) on the object that is on the left side of the multiplication operator.

However, there is an additional magic method __rmul__ that will be invoked on the object on the right side of the multiplication operator. (Similar methods __radd__, __rsub__ and so on also exist.) So let's implement __rmul__ now. Our implementation is simple:

def __rmul__(self, x):
    return self * x

Now we can multiply a scalar by a vector in that order:

>>> v = Vec(2, 4, 5)
>>> 10 * v
[20 40 50]

How does this work? When we write 10 * v, Python calls __rmul__(v, 10) since v is on the right. Then our line 'return self * x' computes v * 10, which in turn automatically invokes the magic method __mul__(v, 10).