Programming 1, 2022-3
Week 13: Notes

writing good code

We now know enough Python that we are starting to write larger programs. Especially when writing a larger program, we want to write code in a way that is structured, clear, and maintainable. Functions and classes are essential tools for this purpose.

Here are three general rules to follow for writing good code:

Don't repeat yourself.

Beginning programmers often write code that looks like this:

if x > 0:
    … 20 lines of code …
else:
    … the same 20 lines of code, with a few small changes …

Or like this:

if key == 'w':   # up
    … 12 lines of code …
elif key == 'x':  # down
    … 12 very similar lines of code …
elif key == 'a':  # left
    … 12 very similar lines of code …
elif key == 'r':  # right
    … 12 very similar lines of code …

This code is poor. It is hard to read: the differences between the parallel blocks of code may be important, but hard to spot. And it is hard to maintain. Every time you change one of the parallel blocks of code, you must change the other(s) in the same way. That is a chore, and it's also easy to forget to do that (or to do it incorrectly).

In some cases, you can eliminate this redundancy by writing code in a more general way, e.g. by looping over four possible direction vectors rather than having four blocks of code for the cases up, left, right, and down. Another useful tool is functions. You can often factor out parallel blocks of code into a function with parameters representing the (possibly small) differences in behavior between the parallel blocks. Then in place of the first code example above you can write

if x > 0:
    my_fun(… arguments …)
else:
    my_fun(… different arguments …)

Make variables as local as possible.

Generally speaking, a local variable is better than an attribute, and an attribute is better than a global variable.

The scope of a variable is the portion of a program's text in which the variable may be read or written. A local variable's scope is the function in which it appears. An attribute's scope is the class in which the attribute appears, at least if you only set the attribute in code in that class (which many languages will enforce, though not Python) A global variable's scope is the entire program. Another way of phrasing this rule is that a variable should have as small a scope as possible, which makes a program easier to understand since there are fewer places where a variable's value might change.

This rule does not apply to constants, i.e. read-only data. A global constant is unproblematic; it will never change, so its meaning is completely clear.
Every line and function should fit on a single screen.

I recommend limiting lines to about 100 characters. Lines that are longer than this are hard to read, especially if since they may wrap in an editor window. I also recommend that you configure your editor to display a vertical gray line at the 100^th column position, so that you can see if you've written a line that is too long. In Visual Studio Code, you can do that via the "editor.rulers" setting. Specifically, you can add this line to your settings.json file:

    "editor.rulers": [ 100 ],

In my opinion, functions should generally be limited to about 50 lines. Functions that are much longer than this quickly become hard to understand, and are hard to read since you may have to scroll up and down to see them.

Of course, there are many best practices for writing software beyond these three basic ideas. As one example, consider the order in which functions appear in a source file, or methods appear in a class. I generally recommend ordering functions and methods from lowest-level to highest-level, in the sense that if function A calls function B, then function A will appear later in the source file. In my opinion this leads to a program that is easy to read. However, as an exception to this rule, I recommend putting the __init__ method at the top of every class, even if it calls other methods in the class.

testing software

Testing is the process of trying to discover bugs in a program and, as far as is possible, ensure that it is bug-free.

The most basic way to test your program is manually, i.e. by running it and manually entering input or using it interactively. This generally doesn't scale well to larger programs. It's also difficult to test a program thoroughly in this way.

In recent decades there has been a major trend toward automated testing. One common form of automated testing is unit tests, which test individual pieces of functionality such as individual modules, classes, methods and/or functions inside a program.

In Python, we can easily write simple unit tests using the 'assert' statement. For example, suppose that we've written a class PriorityQueue with methods add(x), remove_smallest() and is_empty(). Here is a naive implementation of this class:

class PriorityQueue:
    def __init__(self):
        self.a = []

    def is_empty(self):
        return len(self.a) == 0

    def add(self, x):
        self.a.append(x)

    def remove_smallest(self):
        x = min(self.a)
        i = self.a.index(x)
        return self.a.pop(i)

Here is a suite of several unit tests for the PriorityQueue class:

import random

def test1():
    q = PriorityQueue()
    assert q.is_empty()

def test2():
    q = PriorityQueue()

    for x in [4, 2, 6, 5]:
        q.add(x)

    for x in [2, 4, 5, 6]:
        assert not q.is_empty()
        assert q.remove_smallest() == x

def test3():
    q = PriorityQueue()
    
    random.seed(0)
    nums = [random.randrange(1000) for _ in range(100)]
    for x in nums:
        q.add(x)
    
    for x in sorted(nums):
        assert q.remove_smallest() == x

    assert q.is_empty()

def test():
    test1()
    test2()
    test3()
    print('All tests passed')

If any unit test in this function fails, the program will terminate with a stack trace revealing which test caused the problem.

Notice that test3() calls the add() method with a series of random values. Random data of this sort can be very useful for testing. The function also calls random.seed(0), which ensures that random values will be the same on each run. That means that test results will be reproducible, which is a good thing.

It's often a good idea to run all unit tests automatically each time a program is run. That ensures that you'll notice immediately if a change to the program causes a unit test to fail.

We wrote the unit tests above in pure Python, without using any libraries. To make writing tests easier, many testing frameworks are available for Python and other languages. The Python standard library contains the unittest and doctest modules, which are frameworks that you can use for writing tests. However, I generally recommend the pytest library, which you can install using pip. To use pytest, you can write tests using assert statements, as I've done in the PriorityQueue tests above. pytest will automatically discover tests implemented as functions with names beginning with 'test'. So if we are using pytest, we can delete the top-level function test() above, and simply run pytest on our source file:

$ pytest priority_queue.py 
============================= test session starts ==============================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
rootdir: /home/adam/Desktop/lecture
collected 3 items                                                              

priority_queue.py ...                                                    [100%]

============================== 3 passed in 0.01s ===============================
$

All the tests passed. Now let's intentionally introduce a bug. We'll change the last line of remove_smallest() to look like this:

return self.a.pop(i) + 1

Now let's run pytest again. Here is part of the output:

$ pytest priority_queue.py 
==================================== test session starts ====================================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
rootdir: /home/adam/Desktop/lecture
collected 3 items                                                                           

priority_queue.py .FF                                                                 [100%]

========================================= FAILURES ==========================================
___________________________________________ test2 ___________________________________________

    def test2():
        q = PriorityQueue()
    
        for x in [4, 2, 6, 5]:
            q.add(x)
    
        for x in [2, 4, 5, 6]:
            assert not q.is_empty()
>           assert q.remove_smallest() == x
E           assert 3 == 2
...

priority_queue.py:30: AssertionError
___________________________________________ test3 ___________________________________________

...

priority_queue.py:41: AssertionError
================================== short test summary info ==================================
FAILED priority_queue.py::test2 - assert 3 == 2
FAILED priority_queue.py::test3 - assert 2 == 1
================================ 2 failed, 1 passed in 0.02s ================================
$

We see that one of our tests (i.e. test1) passed, and the other two (test2 and test3) failed. Without pytest, we could still run these tests using the top-level test() function that we wrote before, however the program would stop after the first failure. pytest also helpfully shows the source code of the test that failed.

In contrast to unit tests, integration tests test the functionality of groups of modules or the entire program, e.g. by sending input to the program and checking its output.

When we find and fix a bug in a program, we may write a regression test that tests that the bug is gone, and add it to our standard suite of tests for the program. That way we will find out immediately if we ever accidentally reintroduce the same bug in the future.

If a program has a graphical interface, it might not be so easy to test it in an automated way. However, if you write your program using a model-view architecture then you should be able to write automated unit tests for the model portion of your program at least.

Companies that develop software generally spend a lot of time and energy on testing, and often encourage programmers to write lots of tests. Some people have even adopted the practice of test-driven development, in which programmers write unit tests for a function or class before they implement it.

Ideally our tests will check our program's entire set of functionality. There are tools for many languages that will measure test coverage, i.e. the percentage of a program's lines that execute as the suite of tests run. For example, pytest-cov is a plugin that will produce a coverage report for tests run using pytest.

Suppose that our test suite for a program has 80% coverage. Then if there is a bug in the 20% of the lines that were not covered, the tests cannot possibly reveal it. So ideally a program's test will have 100% coverage. However, 100% coverage is often difficult to achieve, since many lines in a program may test obscure edge cases or error conditions, which may be difficult to test automatically.

Furthermore, even 100% coverage does not guarantee that tests will find every possible bug in a program. For example, consider an implementation of quicksort. We might write a single unit test that checks that sorting the array [4, 2, 6] results in the array [2, 4, 6]. This test alone is likely to achieve 100% coverage, since all of the lines of the quicksort function will run as the array is sorted. However, it is certainly possible that the implementation may have bugs which that single test case does not reveal.

To be very sure that software (or hardware) has no bugs, it is possible to write a mathematical proof of correctness of a piece of software (or hardware), and to check the proof in an automated way. However this is relatively expensive and difficult, so it's not done so often in practice. Making this easier is an active area of research.

Programming 1, 2022-3 Week 13: Notes

writing good code

testing software

Programming 1, 2022-3
Week 13: Notes