Some of today's topics are covered in these sections of Think Python:
8. Strings
Here are some additional notes.
We have seen various operators in Python, including arithmetic operators (+, -, *, /, //, %, **), comparison operators (<, <=, >, >=, ==, !=), and boolean operators (and, or, not).
When you write an expression involving operators, it may be potentially ambiguous. For example:
x = 3 + 4 * 5
Does this mean (3 + 4) * 5? Or 3 + (4 * 5)?
Here's another example:
if x > 3 and x < 10 or y > 5: …
Does this mean (x > 3 and x < 10) or y > 5? Or does it mean x > 3 and (x < 10 or y > 5), which is logically different?
To resolve ambiguities such as these, Python uses a fixed ordering of operator precedence. Here are the operators we've seen so far, ranked in order from highest precedence to lowest precedence:
**
*, /, //, %
+, -
<, <=, >, >=, ==, !=
not
and
or
Operators with higher precedence bind more tightly. For example, 3 + 4 * 5 means 3 + (4 * 5), since * binds more tightly than +. (If you like, you can imagine the '*' operator pulling its operands together like a magnet.) Similarly,
x
>
3
and x <
10
or y <
5
means
(x
>
3
and x <
10
)
or y <
5
since 'and' has higher precedence than 'or'.
So far we've seen four basic types in Python: integers, floats, booleans, and strings. Each of these types has an associated type conversion function:
int(x) converts x to an integer. Any fractional part is discarded; for example, int(45.228) will produce 45. True will become 1, and False will become 0.
float(x) converts x to a floating-point number. (If x is an enormous integer, it may not fit, in which case an error will occur.)
bool(x) converts x to a boolean value. Any non-zero number will become True, and zero will become False. If x is a string, the resulting value will be True if the string is non-empty, or False if it is empty.
Note that the 'if' statement automatically converts its condition to a boolean in the same way that bool() does. So, for instance, if 'i' is an integer then the following 'if' statements will behave identically:
if i != 0: … do something … if i: … do something …
str(x) converts x to a string.
For our first peek into Python's enormous standard library, we will see how to use Python's built-in math functions. To get access to these, write this at the top of your program:
import math
These functions include, for example:
fabs(x) - absolute value
sqrt(x) - square root
exp(x) - return ex
log(x) - return loge(x)
sin(x), cos(x), tan(x) - trigonometric functions
Our Python Library Quick Reference lists these functions and others. Also, you can see a full list in the Python library documentation.
To use any of these functions, write "math." followed by the name of the function. For example:
import math print(math.sqrt(2))
prints
1.4142135623730951
Our quick library reference also lists various functions that can generate random numbers. To use these, you must first write 'import random'. We will often use these as well.
Computers generally store text using a coded character set, which assigns a unique number called a code point to each character. Two character sets are used in virtually all software systems today.
First, the ASCII character set includes only 128 characters; its code points range from 0 to 127. For example, in ASCII the character 'A' has the number 65, and 'B' has the number 66. ASCII includes all the characters you see on a standard English-language keyboard: the uppercase and lowercase letters A-Z/a-z of the Latin alphabet, the numbers 0-9 and various punctuation marks such as $, % and &. ASCII does not include accented characters such as č or ř.
You can ses a table of all ASCII characters at asciitable.com.
Note that ASCII includes various whitespace characters, which are not visible on the printed page. We will encounter some of these sometimes:
A tab character (ASCII code 9) moves the output position to the next tab stop.
A newline character (ASCII code 10) moves to the next line. In text files on Linux and macOS, each line ends with an instance of this character.
A space (ASCII code 32) is used throughout text to separate words.
The newer Unicode character set extends ASCII to include all characters in all languages of the world, including accented characters and also ideographic characters in Asian languages such as 日. Code points in Unicode range from 0 to 1,114,111.
The site unicode-table.com has a large table showing all the Unicode characters that exist.
Python 3 is fully compatible with Unicode. You can write
s = 'Řehoř'
or
s = '人'
and these strings will work just like strings of ASCII characters.
Python includes two functions that can map between characters and their integer code points.
Given a Unicode character c, ord(c) returns its code point. For example, ord('A') is 65, and ord('B') is 66. ord('ř') is 345 (a value outside the ASCII range).
chr() works inversely: it maps a code point to a character. For example, ord(65) is 'A', and ord(345) is 'ř'.
These functions are sometimes useful when we wish to manipulate characters. For example, here is a program that reads a lowercase letter, and prints the next letter in the alphabet:
c = input('enter letter: ') i = ord(c) - ord('a') if 0 <= i < 26: i = (i + 1) % 26 print('next letter is', chr(ord('a') + i)) else: print('not a lowercase letter')
The program uses ord() to convert a character (such as 'd') to a number (such as 3) representing its position in the lowercase alphabet. It then adds 1 (mod 26), and uses chr() to map the result back into a lowercase letter.
On all major operating systems, as a program runs it can read from its standard input. (Usually standard input comes from the terminal, but it is also possible to redirect it to come from a file instead.)
In Python, we
will often want to read lines from standard input. The sys.stdin
object is a sequence of lines, and
so we can loop over it using for
.
For example, here is a program that reads numbers from standard
input, one per line, and computes their sum:
import sys sum = 0 for line in sys.stdin: n = int(line) # convert string to integer sum += n print('The sum is', sum)
When we run the program and enter its input from a terminal, we need some way to signal that the input is complete. On Linux or macOS, we can do this by typing Ctrl+D. On Windows, type Ctrl+Z followed by Enter.
When we run the program, we see this:
3 4 5 The sum is 12
Above, we typed Ctrl+D or Ctrl+Z after the number 5 (though that was not visible in the terminal output).
Note when you loop over sys.stdin in this way, each line will be a string that contains a newline character at the end of it. By contrast, when you read a string using input() it will not have a newline at the end. The example above works because the int() function will ignore whitespace (such as a newline) at the end of a string.
In programming it is common to use a nested loop, i.e. a loop inside a loop. For example, here's a program that prints out a rectangle of asterisks:
x = int(input('Enter x: ')) y = int(input('Enter y: ')) for i in range(y): s = '' for j in range(x): s += '*' # add a '*' to s print(s)
The output looks like this:
Enter x: 7 Enter y: 3 ******* ******* *******
Note that the inner loop ("for j in range(x)") runs in its entirety on every iteration of the outer loop ("for i in range(y)"). So the total number of asterisks printed is x ⋅ y.
We can also change the bounds of the inner loop on each iteration of the outer loop. For example, here is a program to print a triangle of asterisks:
n = int(input('Enter size: ')) for i in range(1, n + 1): s = '' for j in range(i): s += '*' print(s)
The output looks like this:
Enter size: 5 * ** *** **** *****
In each of these examples we've used a doubly nested loop, i.e. a loop inside a loop. Of course, loops may be triply or even arbitrarily nested.
Earlier we learned about the 'break' statement, which causes a loop to terminate immediately. A related statement is 'continue', which aborts the current iteration of a loop and continues with the next iteration.
For example, the following loop adds up the sum of numbers from 1 to 100, and also the sum of squares of those numbers. However it uses the 'continue' statement to skip the number 28:
sum = 0 sum_squares = 0 for i in range(1, 101): if i == 28: continue sum += i sum_squares += i * i print(sum) print(sum_squares)
In this particular example, actually we could trivially replace 'continue' with a comparison using '!=':
for i in range(1, 101): if i != 28: sum += i sum_squares += i * i
However 'continue' is sometimes convenient in situations where more code follows it, to avoid moving all that code into a nested block.
Still, we will use 'break' much more often than 'continue'.
Let's consider more string operations in Python. First, the len() function will give us the length of a string:
>>> len('zmrzlina') 8
We can use the [] operator to extract individual characters of a string s. The first character is s[0], the second is s[1], and so on:
>>> s = 'zmrzlina' >>> s[0] 'z' >>> s[1] 'm'
A negative index retrieves a character from the end of the string. For example, s[-1] is the last character, and s[-2] is the second last:
>>> s[-1] 'a' >>> s[-2] 'n'
The syntax s[i : j] returns a slice (i.e. substring) of elements from s[i] up to (but not including) s[j]. Either i or j may be negative to index from the end of the sequence:
>>> s[0:4] 'zmrz' >>> s[4:5] 'l' >>> s[3:-1] 'zlin'
In s[i : j]
, if the start index i is
omitted, it is 0, i.e. the beginning of the string. If the end index
j is omitted, it is len(s), i.e. the end of the string:
>>> s[:4] 'zmrz' >>> s[4:] 'lina'
The syntax s[i
: j : k]
will
extract a slice of characters in which the index advances by k at
each step. For example, if we use k = 2 then we will retrieve
alternative characters:
>>> s[0:7:2] 'zrln'
Note that the step value can even be negative:
>>> s[6:2:-1] 'nilz'
If the step value is negative, then an empty start index refers to the end of the string, and an empty end index refers to the beginning. So you can reverse a string by specifying a step value of -1, and providing neither a start or end index:
>>> s[::-1] 'anilzrmz'
Python includes both functions and methods in its standard library.
A function takes one or more arguments and optionally returns a value. Some of Python's built-in functions that we've already seen in this course are len(), chr(), ord(), input() and print(), for example. To call a function, we simply write its name followed by the arguments:
n = input('Enter a number: ')
A method is like a function, but is invoked on a particular object. For example:
s = 'yoyo' b = s.startswith('yo') # method call
In the second line above, we are invoking (or calling) the startswith() method on the object s. (In Python a value and an object are the same thing.) This method call takes one additional argument, namely the string 'yo'. It returns a value, which is True in this case since the string 'yoyo' does start with 'yo'.
Both functions and methods are common in programming languages today. However, they are not both present in all languages: some languages have only functions, and others have only methods. Python is a bit of a hybrid since it has both functions and methods. This arguably makes the language more flexible and convenient, at the cost of some complexity.
In this course we will soon learn how to write our own functions, and before too long we'll learn how to write our own methods as well.
Our quick reference lists a number of built-in string methods, which can be quite useful. (As an exercise, you may wish to try writing programs that implement the functionality of some of these built-in methods.)