Some of today's topics are covered in these sections of Think Python:
14 Files
Here are some more notes.
In Python, the assert
statement checks that a given condition is true. If it is false, the
program will fail with an AssertionError. For example:
assert 0.0 <= prob <= 1.0
You may add an optional string which will be printed if the assertion fails:
assert 0.0 <= prob <= 1.0, 'probability must be between 0.0 and 1.0'
You can use assertions to verify conditions that should always be true unless there is a bug in the code. If such a condition is false, it is best to find out about it right away, rather than continuing to run and producing an incorrect result later, which may be difficult to debug.
You may also wish to use assertions to verify that arguments passed to a function are valid. For example:
# Compute the average value of numbers in list a def avg(a): assert len(a) > 0, 'list must be non-empty' return sum(a) / len(a)
When we run a program from the command, we may specify command-line arguments. For example:
$ python3 prog.py hello one two three
In a Python program, sys.argv holds a list of all command-line arguments. Actually the first element in the list is the name of the program itself, and subsequent elements hold the arguments. For example, suppose that prog.py holds this program:
import sys print(sys.argv) Let's run it: $ python3 prog.py hello one two three ['prog.py', 'hello', 'one', 'two', 'three']
To read from a file in Python, we can call the open() method and pass a filename. open() will return a file object that we can use to read data from the file. We can use a 'for' loop to iterate over lines in the file, just as we can read lines of standard input by iterating over sys.stdin. When we are finishing reading from the file, we should call the close() method to close the file object.
Here's a program that accepts a filename on the command line, and reports the number of lines and words in the file:
import sys if len(sys.argv) < 2: print(f'usage: {sys.argv[0]} <filename>') # usage message quit() filename = sys.argv[1] f = open(filename) lines = 0 words = 0 for line in f: lines += 1 words += len(line.split()) f.close() print(f'lines = {lines}, words = {words}')
To write to a file in Python, we can call the open() method and pass a filename plus an extra argument 'w', indicating that we want to write. If the file does not exist, it will be created. If the file already exists, it will be truncated: all existing data in the file will be lost! We may then write lines to the file by calling the write() method on the file object once for each output line. When we are finished writing data, we should call close() to close the file.
Here's a program dup.py that accepts two filenames on the command line. It reads lines from the first file and writes them the second file, duplicating every line as it writes:
import sys if len(sys.argv) < 3: print(f'usage: ${sys.argv[0]} <from-file> <to-file>') quit() from_file = sys.argv[1] to_file = sys.argv[2] input = open(from_file, 'r') # open for reading ('r' is default mode) output = open(to_file, 'w') # open for writing for line in input: output.write(line) # write line to output output.write(line) # write it again input.close() output.close()
You can read about more methods for input and output in our Python quick library reference.
The 'with' statement assigns a file object (or other resource) to a variable, then runs a block of code. When the block of code exits for any reason, the object is automatically closed, just as if you had called close() on the object. It's good practice to use 'with' when you open a file, to ensure that the file will be closed even if the program exits with an error.
Let's rewrite the previous program using 'with':
import sys if len(sys.argv) < 3: print(f'usage: ${sys.argv[0]} <from-file> <to-file>') quit() from_file = sys.argv[1] to_file = sys.argv[2] with open(from_file, 'r') as input: # open for reading with open(to_file, 'w') as output: # open for writing for line in input: output.write(line) output.write(line)
In Python and many other object-oriented languages, we may define our own data types, which are called classes. After we define a class, we may create instances of the class, which are called objects. An object has a set of attributes, which are data that belongs to the object. A class defines methods which may run on instances of the class.
In fact, Python is a purely object-oriented language, so in Python every type is a class, and every value is an object. For example, the value 3 is actually an object belonging to a class called 'int':
>>> type(3) <class 'int'>
However, we usually use the terms "type" and "value" when discussing primitive types such as ints and floats, and we use the terms "class" and "object" when discussing user-defined classes and their instances. (Strictly speaking, in some languages types and classes are not exactly the same thing, but we don't need to concern ourselves with that at this point.)
As a first example, let's create a class Point. This is the smallest possible class definition in Python:
class Point: pass
In Python, the 'pass' statement does nothing. It's needed here, since a class definition may not be completely empty.
We may now create objects which are instances of class Point, and assign attributes to them:
>>> p = Point() >>> p.x = 3 >>> p.y = 4 >>> q = Point() >>> q.x = 10 >>> q.y = 20 >>> p.x 3 >>> q.y 20
Above, the function Point() is a constructor that we can call to create a Point object.
However, typically when we write any class we write an initializer method that takes the arguments (if any) that are passed to the constructor, and uses them to initialize the object, typically by creating attributes. For example:
class Point: def __init__(self, x, y): self.x = x self.y = y
In Python, the name __init__ is special: it means that this method is an initializer, and will run automatically when a new instance of this class is created.
Every method has a parameter list that begins with a parameter that receives the object on which the method was invoked. Traditionally in Python this parameter is called 'self' (though actually it may have any name). In an initializer method, this parameter receives the object that is being created.
Let's create a couple of Point objects using this new initializer:
>>> p = Point(3, 4) >>> q = Point(10, 20) >>> p.x 3 >>> q.y 20
Notice in this example that when we call the constructor, we pass only two arguments, but the parameter list in __init__ has three parameters. That's because 'self' is an extra parameter that is passed automatically.
Let's add a few methods to the Point class. It will now look like this:
import math class Point: def __init__(self, x, y): self.x = x self.y = y # Return the Euclidean distance from this point to the origin. def from_origin(self): return math.sqrt(self.x ** 2 + self.y ** 2) # Return true if this point is the origin (0, 0). def is_origin(self): d = self.from_origin() return d == 0 # Return the distance between this point and another point q. def dist_from(self, q): return math.sqrt((self.x - q.x) ** 2 + (self.y - q.y) ** 2)
Notice that
from_origin() accesses the attributes 'x' and 'y' of the object 'self'
is_origin() calls the from_origin() method on the same object that it was invoked on
dist_from() accesses the attributes 'x' and 'y' of 'self', and also those same attributes of another Point object 'q'
Here are two more classes that we wrote in the lecture:
class Line: def __init__(self, p, q): # create a line from p to q self.p = p self.q = q def length(self): return self.p.dist_from(self.q) # a vector of any dimension class Vector: def __init__(self, *args): self.a = args def length(self): s = 0 for x in self.a: s += x * x return math.sqrt(s) # Return the dot product of self and w. def dot(self, w): assert len(self.a) == len(w.a) s = 0 for i in range(len(self.a)): s += self.a[i] * w.a[i] return s
Above, we learned about __init__, which is a method with a special name that Python recognizes and which affects an object's behavior in a certain way. Python actually recognizes many different special method names, all of which begin and end with two underscores. Methods with these names are often called magic methods.
Another magic method in Python is called __repr__. If this method is defined in a class, Python calls it automatically whenever it needs to generate a string representation of an object. For example, this happens when you print out an object in the interactive Python console. By default, the string representation is just a blob of text with an ugly hexadecimal number:
>>> p = Point(
3
,
4
)
>>> q = Point(
10
,
20
)
>>> l = Line(p, q)
>>> p <__main__.Point object at 0x7f173b1aa8b0> >>> l <__main__.Line object at 0x7f173b1aa7c0>
Let's add __repr__ methods to the Point and Line class to define a nicer string representation for these classes:
class Point: ... def __repr__(self): return f'({self.x}, {self.y})' class Line ... def __repr__(self): return f'{self.p} – {self.q}'
Now Point and Line objects will print more nicely:
>>> p = Point(3, 4) >>> q = Point(10, 20) >>> l = Line(p, q) >>> p (3, 4) >>> l (3, 4) - (10, 20)
Notice that in our __repr__ method in the Line class, we wrote 'self.p' and 'self.q' in curly braces. In this situation, Python needs to convert self.p and self.q to strings, so it will call the __repr__ method of the Point class to perform that task.