This week we discussed topics related to software engineering, including
organizing and structuring code
debugging
optimization
performance profiling
build systems
testing
unit tests
Typically we measure a program's size in the number of lines that it contains. The cloc utility is useful for counting lines in any program.
In the lecture we looked at the sizes of various real-world programs. Here are some examples:
gedit: 42,000 lines of C
GTK: 590,000 lines of C
Visual Studio Code: 710,000 lines of TypeScript
LibreOffice: 4.1 million lines of C++
Linux kernel: 15.3 million lines of C
As a rough estimate, a good programmer might write 1,000 lines of code in a month. So a program with 1,000,000 lines of code might take something like 1,000 person-months = 83 person-years of time to write. Probably no single person can understand every line in a program that large.
Here are some basic tips:
Be consistent. If you're writing a program by yourself, adopt a consistent style. If you're working on a program with multiple authors, use the same coding style that they are using.
Don't write lines that are hundreds of characters long - they will wrap in an editor and are hard to read. I personally recommend a maximum line length of 100 characters.
For ordering methods within a class, and classes within a source file, I generally prefer dependency order. In other words, if method A calls method B, then method B should appear before method A. Many beginning programmers simply arrange methods and classes in the order in which they were written. That's usually a poor idea, since that order isn't related to the final structure of the program.
Your program should not have much (or any) commented-out code. If you're commenting out code because you think you might need to go back to it later, that's the wrong approach: version control is the right solution to that problem.
You probably don't need to use namespaces in a program until it has thousands of lines.
Tips for debugging:
Print statements should usually be your first debugging technique. A debugger is also a useful tool.
git bisect can be an invaluable tool for determining where a bug first appeared in a program's history.
If you're in a low-level unsafe language such as C or C++, Valgrind can often point out bugs in which you write to unallocated memory. Sometimes these bugs can be very difficult to find using ordinary debugging techniques.
Talk to a rubber duck. Get some sleep.
Usually it's best to write a program in the simplest possible way at first, then optimize it later.
In most programs, a small part of the code is the performance bottleneck. Optimizing other part of the code will probably have no discernable impact on performance. So if you want to make your program faster, you need to find out where that bottleneck is.
One quick and easy technique for finding your program's bottleneck is to press Ctrl+C in a debugger, then examine the call stack.
You can manually profile your code by adding timing code that measures the execution time of functions or loops. This usually isn't too hard, and sometimes is the most flexible approach.
Various tools exist that can profile your program, showing how much time is spent in each function or method. Some tools are based on sampling: they periodically interrupt your program to see where the call stack is at that moment. Other tools will instrument your code by adding instructions that measure timing.
sysprof is a useful system-wide sampling profiler for Linux systems.
Almost any program that's larger than a few hundred lines will need some sort of build system. Here are some that we discussed in class:
MSBuild is Microsoft's build system that uses .csproj files for C#. It's automatically invoked by 'dotnet build'. It uses an XML-based project file format which is automatically generated by tools such as Visual Studio or 'dotnet new'. MSBuild is OK for C# projects, but I wouldn't recommend it for larger programs in which various tasks need to be performed during the build process.
make is a classic UNIX build tool that works well for small to medium-sized projects.
ninja is a newer alternative to make that is much faster when building large programs. ninja build files are not intended to be written by hand; they are generated by higher-level build systems.
autotools (autoconf / automake) is a classic UNIX build system that generates Makefiles. You will often encounter it, but it is complicated and fairly ugly. I don't recommend using it for new projects.
cmake is a newer build system that is widely used. It can generate makefiles or ninja build files.
Meson is an even newer build system that can also generate makefiles or ninja build files. I generally recommend using Meson once a simple makefile is no longer adequate for your project.
Any serious program needs to be tested continuously during its development.
The most basic way to test your program is manually, i.e. by running it and manually entering input or using it interactively. This generally doesn't scale well to larger programs. It's also difficult to test a program thoroughly in this way.
In recent decades there has been a major trend toward automated testing. One common form of automated testing is unit tests, which test individual pieces of functionality such as individual methods or classes inside a program. By contrast, system tests test the functionality of the entire program, e.g. by sending input to the program and checking its output.
There are many test frameworks which can help you write automated tests for your program. NUnit is one popular such framework for C#.
If a program has a graphical interface, it may not be so easy to test it in an automated way. However, if you write your program using a model-view architecture then you should be able to write automated unit tests for the model portion of your program at least.
To be very sure that software (or hardware) has no bugs, it is possible to write a mathematical proof of correctness of a piece of software (or hardware), and to check the proof in an automated way using a computer program. However this is relatively expensive and difficult, so it's not done so often in practice. Making this easier is an active area of research.