Week 15: Notes

program sizes

Typically we measure a program's size in the number of lines that it contains. The cloc utility is useful for counting lines in any program.

In the lecture we examined various real-world programs to see how large they are, and which languages they are written in. The line counts here were measured by cloc, and don't include blank lines or comments:

gnome-mines (a Minesweeper clone in GTK): 1,700 lines of Vala
gnome-terminal: 18,000 lines of C++
Pinta (a graphics editor): 36,000 lines of C#
Nautilus (the file manager on the GNOME desktop): 88,000 lines of C
GTK: 614,000 lines of C
Visual Studio Code: 920,000 lines of TypeScript
Firefox: 6.7M lines of JavaScript, 5.2M lines of C++, 2.6M lines of C, 2.5M lines of Rust
Linux kernel: 15.3M lines of C

As you can see, many programs are larger than you might think. Also, most programs grow continuously over time as developers add more features.

As a rough estimate, a good programmer might write 1,000 lines of code in a month. So a program with 1,000,000 lines of code might take something like 1,000 person-months = 83 person-years of time to write. Probably no single person can understand every line in a program that large.

organizing and structuring code

Be consistent. If you're writing a program by yourself, adopt a consistent style. If you're working on a program with multiple authors, use the same coding style that they are using.

Don't write lines that are hundreds of characters long - they will wrap in an editor and are hard to read. I personally recommend a maximum line length of 100 characters.

For ordering methods within a class, and classes within a source file, I generally prefer dependency order. In other words, if method A calls method B, then method B should appear before method A. Many beginning programmers simply arrange methods and classes in the order in which they were written. That's usually a poor idea, since that order isn't related to the final structure of the program.

Your program should not have much (or any) commented-out code. If you're commenting out code because you think you might need to go back to it later, that's the wrong approach: version control is the right solution to that problem.

You probably don't need to use namespaces in a program until it has thousands of lines.

debugging

Print statements should usually be your first debugging technique. A debugger is also a useful tool.

git bisect can be a valuable tool for determining where a bug first appeared in a program's history. Using git bisect, you can mark one commit (typically the latest) as bad, and then mark another commit as good, representing a moment when the bug was known not to exist. git bisect will perform a binary search through commits between the good and bad commits, asking you in turn whether each commit is good or bad. This allows you to find the specific commit in which the bug was introduced.

optimization and profiling

Usually it's best to write a program in the simplest possible way at first, then optimize it later if necessary. If you try to make all parts of your program as fast as possible when you first write it, you may end up doing a lot of unnecessary work and making the program unnecessarily complicated. For this reason, the famous computer scientist Donald Knuth once said that "premature optimization is the root of all evil".

If you do need to optimize the code to make it faster, you should be aware that in most programs a small part of the code is the performance bottleneck. Optimizing other part of the code will probably have no discernable impact on performance. So if you want to make your program faster, you need to find out where that bottleneck is.

One quick and easy technique for finding your program's bottleneck is to press Ctrl+C in a debugger, then examine the call stack. It is likely (though not guaranteed) that at the moment you pressed Ctrl+C, execution was in the region of your code that is executed most frequently, i.e. the bottleneck.

Profiling code means analying its performance to find out which regions of code are using the most time and/or memory. You can manually profile your code by adding timing code that measures the execution time of functions or loops. This usually isn't too hard, and sometimes is the most flexible approach.

Alternatively, various tools exist that can profile your program, showing how much time is spent in each function or method. Some tools are based on sampling: they periodically interrupt your program to see where the call stack is at that moment. Other tools will instrument your code by adding instructions that measure timing.

sysprof is a useful system-wide sampling profiler for Linux systems.