Week 3: Notes

functions and methods

Python includes both functions and methods in its standard library.

A function takes one or more arguments and optionally returns a value. Some of Python's built-in functions that we've already seen in this course are len(), chr(), ord(), input() and print(). To call a function, we simply write its name followed by the arguments:

name = input('Enter your name: ')

A method is like a function, but is invoked on a particular object. For example:

s = 'yoyo'
b = s.startswith('yo')  # method call

In the second line above, we are invoking (or calling) the startswith() method on the string s. We pass the string 'yo' to the method. The method returns a value, which is True in this case since the string 'yoyo' does start with 'yo'.

Above, we said that a method is invoked on an object. In Python any value is an object, so (for example) 3, False, and 'yoyo' are all objects. (In some other languages, there is a technical distinction between objects and other kinds of values.) Methods (and a related feature, classes, which we'll discuss later) are fundamental building blocks in object-oriented programming. A language that has methods and classes, such as Python, C++, Java, or C#, is called object-oriented.

Not all programming languages have both functions and methods. For example, C has only functions, and classic Java has only methods. Python is a bit of a hybrid since it has both functions and methods. This arguably makes the language more flexible and convenient (at the cost of some complexity).

In this course we will soon learn how to write our own functions, and before too long we'll learn how to write our own methods (and classes) as well.

more string methods

Python's library contains many more useful methods on strings. For example, .lower() converts all characters in a string to lowercase:

>>> s = 'YUMMY PIE'
>>> s.lower()
'yummy pie'
>>> s
'YUMMY PIE'

Notice that the call to lower() above returned a new string that was like s, but in which all characters are lowercase. It did not modify the original string s, which still contains uppercase characters. In fact, it could not possibly modify s, since Python strings are immutable. Many string methods are similar to lower() in that they return a new string derived by modifying a given string in some way.

Our quick reference lists more string methods and operators. Note that strings are iterable, since you can loop over them with 'for'. They are also sequences, since you can access string elements using the syntax s[i]. Soon we'll see other kinds of iterables and sequences. (In Python every sequence is iterable, but some iterables such as sys.stdin are not sequences). So if you're looking in the quick reference for operations that work on strings, you can find them in three places: in the section on iterables, in the section on sequences, and in the section specifically about strings.

reading lines from standard input

We've already seen Python's input() and print() functions. input() reads a line from standard input. print() reads a line to standard output. By default standard input and output are the terminal, but we'll soon see that we can redirect them from or to a file.

At some point the input may reach its end. When the input comes from a terminal, we can signal the end of the input by typing ctrl+D (on Linux or macOS) or Ctrl+Z Enter (on Windows).

Let's write a program that reads numbers from standard input, one per line, until it ends. The program will print the sum of the numbers. We could try to use input() to read the numbers, however when the input ends input() will produce an error, which is inconvenient. As another approach, we can use the object sys.stdin, which represents a program's standard input. We can loop over sys.stdin using a 'for' statement. On each iteration, we'll receive the next input line as a string:

import sys
  
sum = 0
for line in sys.stdin:  # for each line of standard input
    n = int(line)       # convert string to integer
    sum += n
  
print('The sum is', sum)

Let's run the program, and give it some numbers as input:

$ python sum.py
3
4
5
^D

At the end we typed ^D (control D), meaning end of input. The program now prints

The sum is 12

redirecting input and output

When you run a program you may redirect its standard input to come from a file, and may also redirect standard output to a file.

Use the '<' character to redirect the input. For example, let's use a text editor to create a file test.in with these contents:

4
5
6

Now let's run the Python program frm the previous section, redirecting its input from test.in:

$ python sum.py < test.in
15

Similarly, you can use the > character to redirect a program's output. Let's run the program again, redirecting both the input and output:

$ python sum.py < test.in > test.out
$

The program ran, but produced no output on the terminal since its output was redirected. Let's look at the output it produced. You can view it in an editor. Alternatively, the 'cat' command (on Linux or macOS) will display the contents of a file:

$ cat test.out
15
$

Many of our homework assignments in our ReCodEx system will contain sample input(s) and sample output(s) for the program that you're supposed to write. You may want to place each sample input in a text file. Then you can run your program with its input redirected from each file in turn. That will be much more convenient than manually entering input data each time you run your program.

processing newline characters

Earlier, we saw that we can loop over sys.stdin to read lines from a file. You should be aware that when you do this, each string you receive will end with a newline character. Consider this program print.py, which reads all lines from standard input and copies them to standard output:

import sys

for line in sys.stdin:
    print(line)

Suppose that we have a text file story.txt with three lines:

the beginning

the middle

the end

Let's run the program above and redirect its input from this file:

$ python print.py < story.txt
the beginning

the middle

the end

$

­Notice the extra blank lines after each output line. As mentioned above, each string generated by the 'for' loop will end with a newline character. For example, the first line read from the file will be 'the beginning\n'. (As we saw in an earlier section, on Windows the file will actually contain '\r\n' at the end of the line, but Python will convert this sequence to '\n'.) When we invoke print() on this string, it prints the newline in the string, and then prints a second newline because print() normally prints a newline after any output string you give it.

If we don't want the extra lines, we can call the strip() method to remove the newlines returned by 'for'. strip() removes all whitespace at the beginning and end of a string. Whitespace includes unprintable characters such as spaces and newlines:

>>> '   one   two   three   '.strip()
'one   two   three'
>>> 'down the street\n'.strip()
'down the street'

Let's modify the program print.py() above so that it strips each line read from standard input:

import sys

for line in sys.stdin:
    line = line.strip()
    print(line)

Now it won't print extra blank lines:

$ py print.py < story.txt
the beginning
the middle
the end
$ 

Alternatively, if want to remove only the newline character at the end of the line but leave all other whitespace intact, then instead of calling strip() we could call

line = line[:-1]

string formatting

Python includes f-strings, which are formatted strings that can contain interpolated values. For example:

>>> color1 = 'blue'
>>> color2 = 'green'
>>> f'The sky is {color1} and the field is {color2}'
'The sky is blue and the field is green'

Write the character f immediately before a string to indicate that it is an f-string.

Interpolated values can be arbitrary expressions. For example, consider a program that reads two values and prints their sum. Without an f-string, we might write

x = int(input('Enter x: '))
y = int(input('Enter y: '))

print('The sum of', x, 'and', y, 'is', x + y)

Using an f-string, we may instead write the last line like this:

print(f'The sum of {x} and {y} is {x + y}')

In my opinion, this is easier to read and write.

You may optionally specify a format code after each interpolated value to indicate how it should be rendered as a string. Some common format codes include

For example:

>>> import math
>>> m = 127
>>> f'hex value is {m:x}'
'hex value is 7f'

>>> import math
>>> math.pi
3.141592653589793
>>> f'pi is {math.pi:.3f}'
'pi is 3.142'

Notice that Python rounds (rather than truncates) a floating-point number to a given number of digits.

You can specify a comma (',') before a 'd' or 'x' format code to specify that digits should be printed in groups of 3, with a separator between groups:

>>> x = 2 ** 100
>>> f'{x:,d}'
'1,267,650,600,228,229,401,496,703,205,376'

An integer preceding a format code indicates a field width. If the value's width in characters is less than the field width, it will be padded with spaces on the left:

>>> f'{23:9d}'
'       23'
>>> f'{723:9d}'
'      723'
>>> f'{72377645:9d}'
' 72377645'

If the field width is preceded with a '0', then the output will be padded with zeroes instead:

>>> f'{23:09d}'
'000000023'
>>> f'{723:09d}'
'000000723'
>>> f'{72377645:09d}'
'072377645'

There are many more format codes that can you can use to control output formatting in more detail. See the Python library documentation for a complete description of these.

lists

Lists are a fundamental type in Python. We can make a list by specifying a series of values surrounded by square brackets:

l = [3, 5, 9, 11, 15]

A list may contain values of various types:

l = ['horse', 789, False, -22.3]

It may contain any number of values, or may even be empty:

l = []

The len function returns the number of elements in a list:

len(['potato', 'tomato', 'tornado'])    # returns 3

We can access elements of a list by index. The first element has index 0, and the last element has index len(l) – 1:

>>> l = [3, 5, 9, 11, 15]
>>> l[0]
3
>>> l[4]
15

Just like with strings, we can use negative indices to count from the end of the list:

>>> l = [3, 5, 9, 11, 15]
>>> l[-1]
15
>>> l[-2]
11

Slice syntax works with lists, just like with strings:

>>> l = [3, 5, 9, 11, 15]
>>> l[1:3]
[5, 9]
>>> l[3:]
[11, 15]

The 'in' operator tests whether a list contains a given value:

>>> 77 in [2, 8, 77, 3, 1]
True

Note that this is a bit different than 'in' on strings. The 'in' operator does not test whether a sublist is present in a list:

>>> [8, 77] in [2, 8, 77, 3, 1]
False

Unlike strings, lists in Python are mutable. We can set values by index:

>>> l = [3, 5, 9, 11, 15]
>>> l[0] = 77
>>> l[3] = 99
>>> l
[77, 5, 9, 99, 15]

more list operations

A list's length may change over time. The append() method adds a single element to a list:

>>> l = [3, 5, 9, 11, 15]
>>> l.append(20)
>>> l.append(30)
>>> l
[3, 5, 9, 11, 15, 20, 30]

We'll often use append() to build up a list in a loop. For example, we can build a list of the squares of all numbers from 1 to 10:

l = []
for i in range(1, 11):      # 1 .. 10
    l.append(i * i)

The extend() method adds a series of elements to a list. The += operator is a synonym for extend():

>>> l = [2, 4, 6]
>>> l.extend([8, 10])
>>> l
[2, 4, 6, 8, 10]
>>> l += [12, 14]
>>> l
[2, 4, 6, 8, 10, 12, 14]

The insert() method inserts an element into a list at a given position:

>>> l = [3, 5, 9, 11, 15]
>>> l.insert(2, 88)
>>> l
[3, 5, 88, 9, 11, 15]

The del operator can delete one or more elements of a list by index:

>>> l = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'grape']
>>> del l[1]
>>> l
['orange', 'pear', 'banana', 'kiwi', 'grape']
>>> del l[2:4]
>>> l
['orange', 'pear', 'grape']

We may even assign to a slice in a list, replacing that slice with an arbitrary sequence of values:

>>> l = [2, 4, 6, 8, 10]
>>> l[1:3]
[4, 6]
>>> l[1:3] = [100, 200, 300]
>>> l
[2, 100, 200, 300, 8, 10]

The list() function converts any sequence to a list:

>>> list('watermelon')
['w', 'a', 't', 'e', 'r', 'm', 'e', 'l', 'o', 'n']
>>> list(range(120, 130))
[120, 121, 122, 123, 124, 125, 126, 127, 128, 129]

Like strings, lists are iterable, so you can loop over them using 'for'. Lists are also sequences, since you can access their elements using the syntax l[i]. So you can find list operations in three sections in our quick reference guide, namely the sections about iterables, sequences, and specifically about lists.

lists are arrays

A Python list is actually an array, meaning a sequence of elements stored in contiguous memory locations. In fact it is a dynamic array, since it can expand over time. (In many programming languages, arrays have a fixed size).

For this reason, we can retrieve or update any element of a list by index extremely quickly, in constant time. Even if a list l has 1,000,000,000 elements, accessing e.g. l[927_774_282] will be extremely fast, just as fast as accessing the first element of a short list.

In our algorithms class, we will usually use the term "array" to describe this kind of data structure.

splitting and joining strings

The split() method is convenient for breaking strings into words. By default, it will consider words to be separated by sequences of whitespace characters, which include spaces, tabs, newlines, and other unprintable characters. It returns a list of strings:

>>> 'a big red truck'.split()
['a', 'big', 'red', 'truck']

>>> '   a     big    red   truck   '.split()
['a', 'big', 'red', 'truck']

Here's a program that reads a series of lines from standard input, each containing a series of integers separated by one or more spaces. It will print the sum of all the integers on all the input lines:

import sys

sum = 0
for line in sys.stdin:
    for word in line.split():
        sum += int(word)

print(sum)

You may optionally pass a character to split() specifying a delimiter that separates words instead of whitespace:

>>> 'big_red_truck'.split('_')
['big', 'red', 'truck']

The method join() has the opposite effect: it joins a list of strings into a single string, inserting a given separator string between each pair of strings:

>>> ' '.join(['tall', 'green', 'tree'])
'tall green tree'

>>> '_'.join(['tall', 'green', 'tree'])
'tall_green_tree'

Here's a program that reads a single line, breaks it it into words, reverses the words, then prints them back out:

words = input().split()       # break input into words
words = words[::-1]           # reverse them
print(' '.join(words))

Let's run it:

$ py rev.py
one fine day
day fine one
$ 

structural and reference equality

Suppose that we write the following declarations:

>>> l = [3, 5, 7, 9]
>>> m = l

Now the variables l and m refer to the same list. If we change l[0], then the change will be visible in m:

>>> l[0] = 33
>>> m[0]
33

This works because in fact in Python every variable is a pointer to an object. So two variables can point to the same objects, such as the list above. An assignment "m = l" does not copy a list. It runs in constant time, and is extremely fast.

Alternatively, we may make a copy of the list l. There are several possible ways to do that, all with the same effect:

>>> l = [3, 5, 7, 9]
>>> n = l.copy()      # technique 1: call the copy() method
>>> n = list(l)       # technique 2: call the list() function
>>> n = l[:]          # technique 3: use slice syntax

Now the list n has the same values as l, but it is a different list. Changes in one list will not be visible in the other:

>>> l[1] = 575
>>> l
[3, 575, 7, 9]
>>> n
[3, 5, 7, 9]

Python provides two different operators for testing equality. The first is the == operator:

>>> x == y
True
>>> x == z
True

This operator tests for structural equality. In other words, given two lists, it compares them element by element to see if they are equal. (It will even descend into sublists to compare elements there as well.)

The second equality operator is the is operator:

>>> x is y
True
>>> x is z
False

This operator tests for reference equality. In other words, it returns true only if its arguments actually refer to the same object. (Reference equality is also called physical equality).

You may want to use each of these operators in various situations. Note that is returns instantly (it runs in constant time), whereas == may traverse a list in its entirety, so it may be significantly slower.

nested lists

A list may contain any type of elements, including sublists:

>>> m = [[1], [2, 3, 4, 5], [6]]

A list of lists is a natural way to represent a matrix in Python. Consider this matrix with dimensions 3 x 3:

5  11  12
2   8   7
14  2   6

If we want to store it in Python as a list of lists, normally we will use row-major order, in which each sublist holds a row of the matrix:

m = [ [5, 11, 12], [2, 8, 7], [14, 2, 6] ]

Alternatively we could use column-major order, in which each sublist is a matrix column; then the first sublist would be [5, 2, 14]. The choice is arbitrary, but by convention we will generally use row-major order.

With this ordering, we can use the syntax m[i][j] to access the matrix element at row i, column j. Do not forget that rows and columns are numbered from 0:

>>> m = [ [5, 11, 12], [2, 8, 7], [14, 2, 6] ]
>>> m[1][0]    # row 1, column 0
2

Of course, we may use the index -1 to reference the last row or the last column:

>>> m[-1][-1] = 100
>>> m
[[5, 11, 12], [2, 8, 7], [14, 2, 100]]

Here's a program that will read a matrix from the input, with one row of numbers per input line:

# Read a matrix from the input, e.g.
#
# 2 3 4
# 5 1 8
# 0 2 9

import sys

m = []
for line in sys.stdin:
    # build a row of the matrix

    row = []
    for word in line.split():
        row.append(int(word))

    m.append(row)

print(m)

Now suppose that we want to build a zero matrix of a given size, i.e. a matrix whose elements are all 0. Recall that we may use the * operator to build a list of a given length by repeating a given element:

>>> 3 * [0]
[0, 0, 0]

So you might think that we can build e.g. a 3 x 3 matrix of zeros by

>>> m = 3 * [3 * [0]]
>>> m
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

WARNING: This code looks correct but is not. Let's see what happens if we attempt to set the upper-left element of the matrix:

>>> m[0][0] = 7
>>> m
[[7, 0, 0], [7, 0, 0], [7, 0, 0]]

It appears that several matrix elements have changed!

Here's what's going on here: all three of the sublists are actually pointers to the same list. Here is a similar example:

>>> a = [1, 2, 3]
>>> m = [a, a]
>>> m
[[1, 2, 3], [1, 2, 3]]
>>> a[0] = 7
>>> m
[[7, 2, 3], [7, 2, 3]]

The line m = [a, a] creates a list with two elements, each of which is a pointer to the list a. When a changes, the change is visible in m.

With that understanding, let's revisit our attempt to create a 3 x 3 matrix of zeros:

>>> m = 3 * [3 * [0]]
>>> m
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

This code constructs a single list with three zeroes (3 * [0]), and then repeats it 3 times, however the repetition does not copy the list - it merely makes three pointers to the same list. And so an update to any matrix element will actually be visible in three places in the matrix.

Here's a correct way to make a 3 x 3 matrix of zeroes:

m = []
for i in range(3):
    m.append(3 * [0])

This may seem like more work, though later in this course we'll see how we can write even this form in a single line.