Programming I, 2018-9
Lecture 1 - Notes

course overview

In this class we will essentially learn four things:

We will learn the Pascal programming language in depth. Pascal is an older language that was invented around 1970 and is not used so much today. But I believe it is a fine language for learning programming and exploring introductory data structures and algorithms as we will do in this class. Pascal is a relatively simple language and we should be able to cover most of its constructs within the first month, though we will learn about more language details throughout the course.
We will learn about various elementary data structures. A data structure is a way of representing data in memory. Often a single set of data can be stored in various ways, each of which may have various advantages and disadvantages.

For example, suppose that we have a dataset in memory containing the name and population of various cities. We may wish to perform various operations on this dataset such as (a) finding a city's population given its name; (b) finding a city's name given its population; (c) inserting a new city into the dataset; (d) deleting a city from the dataset, and so on. Depending on the data structure we use, some of these operations may be faster than others. Also, some data structures may use more memory than others. We will study these tradeoffs in this course. Some data structures we will learn about in this class are sorted arrays, linked lists, binary trees, binary heaps, and various graph representations.
We will study various elementary algorithms. An algorithm is a language-independent "recipe" for performing some computational tasks. For example, suppose that we have a set of city names in memory and we want to sort them alphabetically. There are many different sorting algorithms that we could use including bubble sort, insertion sort, merge sort, heap sort and quicksort. Some of these algorithms are generally faster than others, and they differ in other ways such as in the amount of memory they use. We will learn all of these sorting algorithms in this class as well as some number-theoretic algorithms (e.g. for factoring an integer) and some graph algorithms as well.
We will learn how to write programs by working through lots of programming exercises, both in class and as homework. The only way to learn to program well is to practice a lot, and that is a major focus of this class.

Some of you in this class may already have some programming experience which you gained either in an educational setting or informally through your own reading or experimentation. If you have programmed before, this class may be easy for a while. But there are good reasons to pay attention from the beginning. It is always valuable to review the basics of any subject. Also, in this class we will be studying programming in a relatively formal and disciplined way that should form a strong foundation for further work in computer science. This may complement your earlier informal knowledge. You should expect that this class will become more challenging when we cover pointers and recursion, which will be important subjects later in the course.

computers and programs

A computer is a general-purpose computation device that can run programs, which are sequences of instructions that tell a computer precisely what to do. A laptop computer, a tablet, a smart phone and even the chip on some debit cards are all computers.

At the hardware level, a computer generally contains a CPU (central processing unit), some amount of RAM (random access memory) and facilities for performing I/O (input/output). The CPU is the computer's "brain" and executes instructions in machine language, which is the only language that the hardware natively understands. Different CPUs use different varieties of machine language. Each machine language instruction performs only one tiny task such as adding two numbers or writing a number to memory.

It is possible to write programs directly in machine language – usually programmers do so by writing in assembly language, which is a textual representation of machine language (which itself consists only of numeric codes representing instructions). But these days almost all programs are written in high-level languages such as Pascal. To use any high-level language, we need either a compiler, which translates the high-level language to machine language, or an interpreter, which reads a program in a high-level language and executes it on the fly, without translating it to machine language.

In this course we will use the Free Pascal compiler.

powers of 2

You will encounter powers of 2 in many places in computer science.

You should learn all of these powers by heart:

2⁰ = 1; 2¹ = 2; 2² = 4; 2³ = 8; 2⁴ = 16; 2⁵ = 32; 2⁶ = 64; 2⁷ = 128; 2⁸ = 256; 2⁹ = 512; 2¹⁰ = 1024

2¹⁰ = 1024 which approximately equals 1000. We call this number 1 K (for "kilo"). Similarly

2²⁰ = 1 M ("mega") ≈ 1,000,000
2³⁰ = 1 G ("giga") ≈ 1,000,000,000

You can quickly convert any power of 2 to an approximate value involving a symbol such as K, M or G. For example:

2¹⁶ = 2⁶ ⋅ 2¹⁰ = 64 K
2³² = 2² ⋅ 2³⁰ = 4 G

storing integers

At the hardware level, a computer stores a number in binary representation, i.e. in base 2. For example, the number 37₁₀ equals 100101₂. Here, the subscript 10 means "base 10", i.e. the ordinary decimal system. The subject 2 means "base 2", i.e. binary. Here is the mathematical meaning of these representations:

37₁₀ = 3 ⋅ 10¹ + 7 ⋅ 10⁰ = 37
100101₂ = 1 ⋅ 2⁵ + 0 ⋅ 2⁴ + 0 ⋅ 2³ + 1 ⋅ 2² + 0 ⋅ 2¹ + 1 ⋅ 2⁰ = 37

Every integer has a unique represention in base 2 (and in any other base).

If you are not familiar with writing numbers in different bases, you should study this topic since we will be using bases extensively in this course. In particular, we will soon write Pascal programs to convert between different bases. The page Numbers in Different Bases at the Oxford Math Center would be a good place to start.

A binary number such as 100101₂ consists of a series of binary digits, otherwise known as bits. A bit is a fundamental concept in computer science and is simply a 0 or a 1.

Consider the integers that can be stored in 3 bits. They are

000₂ = 0, 001₂ = 1, 010₂ = 2, 011₂ = 3, 100₂ = 4, 101₂ = 5, 110₂ = 6, 111₂ = 7

These are the integers from 0 – 7 inclusive. There are 8 such integers, which makes sense since there are 8 possible combinations of 3 bits. That's because are 2 possibilities for each bit, and 2 ⋅ 2 ⋅ 2 = 8.

More generally, with n bits we can represent any non-negative value from 0 to 2ⁿ – 1, inclusive. In computer science, an integer value that cannot be negative is known as an unsigned integer.

It is also possible to encode negative numbers in binary. The details of this encoding are beyond the scope of this course. However, you should know that if we are storing a signed integer which can be positive, zero, or negative, then with n bits we can represent any value from -2^n-1 to 2^n-1-1, inclusive.

The following sizes of integers are especially common in computer programming:

An unsigned 8-bit integer is known as a byte, and can hold values from 0 to 2⁸ – 1, i.e. from 0 to 255.
A signed 16-bit integer can hold values from -2¹⁵ to 2¹⁵ – 1, i.e. from -32,768 to 32,767.
A signed 32-bit integer can hold values from -2³¹ to 2³¹ – 1, i.e. from -2,147,483,648 to 2,147,483,647.

At the hardware level, virtually every computer's memory is organized as a series of bytes. We often measure memory in Kb (kilobytes), Mb (megabyte) or Gb (gigabytes). For example, 1 Kb = 1,024 bytes.

storing text

Computers generally store text using a coded character set, which assigns a unique number to each of a set of characters. Two character sets are used in virtually all software systems today:

The ASCII character set assigns a 7-bit value (i.e. an integer from 0 to 127) to each character. For example, in ASCII the character 'A' has the number 65, and 'B' has the number 66. ASCII includes all the characters you see on a standard English-language keyboard: the uppercase and lowercase letters A-Z/a-z of the Latin alphabet, the numbers 0-9 and various punctuation marks such as $, % and &. ASCII does not include accented characters such as č or ř.
The newer Unicode character set extends ASCII to include all characters in all languages of the world, including accented characters and also ideographic characters in Asian languages such as 日. Unicode assigns a 21-bit value (i.e. an integer from 0 to 2,097,151) to each character.

Clearly every ASCII character can fit in a byte, and some Unicode characters can only be represented using a sequence of bytes.

Pascal data types

Pascal programs can manipulate various types of data. In a Pascal program, every variable, expression and value has a specific type.

We will learn about various Pascal types throughout this course. We will begin with these 5 basic types:

integer – a 16-bit signed integer or a 32-bit signed integer, depending on whether you have enabled Delphi mode (see below)
real - A floating-point number, i.e. a number such as 2.347868 that can have digits after the decimal point. real values have a much larger range than integers: for example, a real can contain the number 10⁵⁰ . However, real values will not be precisely accurate when values are very large or small: they have only 15-16 significant digits (possibly less on some platforms).
boolean - either true or false.
char - a single-byte character. Any ASCII character can fit in a char; in general Unicode characters cannot.
string - a sequence of characters. Without Delphi mode, a string is limited to 255 characters. If you enable Delphi mode, a string may have any length. A string may contain non-ASCII characters such as ř. Such characters are represented using multiple bytes in the string.

hello, world

Here is a first Pascal program that is about as simple as it gets:

begin
  writeln('hello, world');
end.

To execute this program on your computer, you'll first need to type it into a text editor. You'll then save it to a file with a name such as 'hello.pas'. You can then compile the program and the run the resulting executable. For more information about how to do that, see my page on editing and compiling Pascal programs.

When you run the program, it will produce this output:

  hello, world

Note the following in the program above:

The statement to be executed is enclosed by begin and end.
The statement is followed by a semicolon (;). We will almost always write a semicolon after every statement in a Pascal program.
There is a period (.) at the end of the program. This period is required.

Delphi mode

To enable the Delphi dialect of Free Pascal, include this at the top of your source file:

{$mode delphi}

This has important consequences:

integer represents a 32-bit signed integer rather than a 16-bit signed integer.
string values are not limited to 255 characters.

I recommend that you put this directive at the top of every program. (To save space, however, I will not generally write it at the top of programs in these lecture notes.)

comments

A comment is text that the compiler will ignore. Good programmers often add comments to their code to explain to others (or even to themselves) what a program is doing.

There are three different syntaxes for comments in a Pascal program:

// 1. this is a single-line comment

{ 2. this is a comment,
     and may span multiple lines }

(* 3. this comment can also
      span
      multiple lines *)

Generally I recommend using either syntax 1 or 2 above.

declaring variables

To do much of anything in a Pascal program you will need some variables. A variable is a value of a particular type that can change over time as your program runs.

A variable declaration looks like this:

var
  x: integer;
  r: real;

You may declare several variables of the same type together:

var
  s, t, u: string;

When declaring a single variable, you may give it an intial value:

var
  a: integer = 4;

assignment statements

An assignment statement puts a new value into a variable. Here is a program that uses some assignment statements:

var
  x: integer;
  y: integer;

begin
  x := 4;
  y := x + 1;
  x := y + 2;
  writeln('now x is ', x, ' and y is ', y);
end.

When you run this program, initially x and y are undefined because the program does not specify initial values for those variables when it declares them. The first line sets x to 4. The second line sets y to x + 1 = 4 + 1 = 5. Then the third line sets x to y + 2 = 5 + 2 = 7. Note that this line changes the value of x.

The program will print

  now x is 7 and y is 5

Note that a single call to writeln can write out multiple values. We will use this capability often. Also note that writeln can write values of any of the types we have seen so far (i.e. integer, real, boolean, char, string).

An assignment statement can include the same variable name on the left and right sides. For example, the statement

  x := x + 1;

adds one to the existing value of x. If x was 4 before the statement executes, it will be 5 afterward.

arithmetic operators

The program in the preceding section uses the arithmetic operator +. Pascal includes these arithmetic operators:

+ : addition
- : subtraction
* : multiplication
/ : floating-point division
div : integer division (truncating toward zero)
mod : integer remainder

The +, - and * operators can operate on integers or reals. If given two expressions of type integer, these operators will return an integer; otherwise they return a real value.

/ can operate on integers or reals; it always returns a real. That means that this program will not compile:

var
  x: integer = 4;

begin
  x := x / 2;  // ERROR – the / operator returns a real, which cannot fit in an integer variable
  writeln(x);
end.

The div operator can operate only on integers, and return an integer. For example, 23 div 3 = 7, since when we divide 23 by 3 we find that the result is 7 excluding the remainder. In other words, when we truncate the fraction 23/3 toward zero we get the integer 7.

We can rewrite the program above as follows:

var
  x: integer = 4;

begin
  x := x div 2;  // compiles with no problem
  writeln(x);
end.

The mod operator can operate only on integers, and return an integer. As an example, 23 mod 3 = 2, which is the remainder after dividing 23 by 3.

reading input

You can use readln to read a value from a program's standard input (which is usually the keyboard). Here is a program that reads two numbers and prints their sum:

var
  x, y: integer;

begin
  write('Enter x: ');
  readln(x);

  write('Enter y: ');
  readln(y);

  writeln('The sum is ', x + y);
end.

If we run this program and enter some numbers, the output might look like this:

Enter x: 4
Enter y: 7
The sum is 11

Note that in this program we used write rather than writeln. The difference between these is that writeln writes a newline character at the end of its output, which causes the cursor to move to the beginning of the next line. write does not do this. If we had used writeln rather than write in this program, the output might look like this:

Enter x:
4
Enter y:
7
The sum is 11

Similarly, there are separate functions read and readln. readln reads one or more values from a single line of input, and then discards the rest of the input line. read also reads one or more values, but remains on the same input line so that a subsequent call to read (or readln) can read more values from that line.

constant declarations

We have seen how to declare variables. It will also sometimes be useful to declare constants:

const
  Seconds = 60;

The expression in a constant declaration may contain other constants and operators, but no variables:

const
  Microseconds = Seconds * 1000 * 1000;

if

The if statement executes one or more statements if a condition is true:

if x > 3 then
  begin
    y := x;
    z := x + 1;
  end;

If there is only one statement to be executed, you may omit the begin and end.

An if statement may optionally include an else clause indicating one or more statements to be executed if the condition is false:

if x > 3 then
  y := x + 1
else
  begin
    z := x + 1;
    y := x – 1;
  end;

Warning: do not put a semicolon before the else, or the compiler will complain! This is the one place in Pascal where a statement must not be followed by a semicolon.

Here is a program that uses if to print a message indicating whether a number is greater than 7.

var
  x: integer;

begin
  write('Enter x: ');
  readln(x);
  if x > 7 then
    writeln('greater than 7')
  else
    writeln('not greater than 7');
end.

comparison operators

The program in the preceding section uses the comparison operator > (greater than). Pascal includes these comparison operators:

= : equal
<> : not equal
< : less than
> : greater than
<= : less than or equal
>= : greater than or equal

These operators can compare any values of any primitive type (boolean, integer, real, char, string). Strings are compared alphabetically.

Note especially the difference between the assignment operator := and the comparison operator =. If you want to assign a value to a variable in Pascal, you must use :=.

for loops

A for…to statement loops over a series of values. For each value in the series, it executes the body of the loop, which consists of one or more statements. On each iteration of the loop, the loop variable is set to the next value in the series.

Here is a program that uses for to print all values from 1 to 10:

var
  i: integer;

begin
  for i := 1 to 10 do
    writeln(i);
end.

You must declare the loop variable (i in this case) in a var section before you can use it in a for statement. The loop variable may be of type integer or char.

Just as with the if statement, you can use begin and end to enclose multiple statements in the loop body. For example:

for i := 1 to n do
  begin
    writeln(i);
    writeln(i * 2);
  end;

To loop through decreasing values, use downto rather than to:

for i := 10 downto 1 do
  writeln(i);
writeln('liftoff');

computing a sum

Let's write a program that reads an integer N such as 5 and prints output such as the following:

1 + 2 + 3 + 4 + 5 = 15

We can accomplish this by using a for statement to loop through the values from 1 to N. In the loop body we will perform two tasks:

Add the current value to the running total, so that at the end of the loop we will know the sum of all the numbers.
Write out the current value.

Here is the program:

var
  n: integer;
  i: integer;
  sum: integer = 0;

begin
  write('Enter n: ');
  readln(n);

  for i := 1 to n do
    begin
      sum := sum + i;
      write(i);
      if i < n then
        write(' + ');
    end;

  writeln(' = ', sum);
end.

Study this program and make sure you understand why the if statement in the loop body is necessary.

Programming I, 2018-9 Lecture 1 - Notes