In this class we will essentially learn four things:
We will learn the Pascal programming language in depth. Pascal is an older language that was invented around 1970 and is not used so much today. But I believe it is a fine language for learning programming and exploring introductory data structures and algorithms as we will do in this class. Pascal is a relatively simple language and we should be able to cover most of its constructs within the first month, though we will learn about more language details throughout the course.
We will learn about various elementary data structures. A data structure is a way of representing data in memory. Often a single set of data can be stored in various ways, each of which may have various advantages and disadvantages.
For example, suppose that we have a dataset in memory containing the name and population of various cities. We may wish to perform various operations on this dataset such as (a) finding a city's population given its name; (b) finding a city's name given its population; (c) inserting a new city into the dataset; (d) deleting a city from the dataset, and so on. Depending on the data structure we use, some of these operations may be faster than others. Also, some data structures may use more memory than others. We will study these tradeoffs in this course. Some data structures we will learn about in this class are sorted arrays, linked lists, binary trees, binary heaps, and various graph representations.
We will study various elementary algorithms. An algorithm is a language-independent "recipe" for performing some computational tasks. For example, suppose that we have a set of city names in memory and we want to sort them alphabetically. There are many different sorting algorithms that we could use including bubble sort, insertion sort, merge sort, heap sort and quicksort. Some of these algorithms are generally faster than others, and they differ in other ways such as in the amount of memory they use. We will learn all of these sorting algorithms in this class as well as some number-theoretic algorithms (e.g. for factoring an integer) and some graph algorithms as well.
We will learn how to write programs by working through lots of programming exercises, both in class and as homework. The only way to learn to program well is to practice a lot, and that is a major focus of this class.
Some of you in this class may already have some programming experience which you gained either in an educational setting or informally through your own reading or experimentation. If you have programmed before, this class may be easy for a while. But there are good reasons to pay attention from the beginning. It is always valuable to review the basics of any subject. Also, in this class we will be studying programming in a relatively formal and disciplined way that should form a strong foundation for further work in computer science. This may complement your earlier informal knowledge. You should expect that this class will become more challenging when we cover pointers and recursion, which will be important subjects later in the course.
A computer is a general-purpose computation device that can run programs, which are sequences of instructions that tell a computer precisely what to do. A laptop computer, a tablet, a smart phone and even the chip on some debit cards are all computers.
At the hardware level, a computer generally contains a CPU (central processing unit), some amount of RAM (random access memory) and facilities for performing I/O (input/output). The CPU is the computer's "brain" and executes instructions in machine language, which is the only language that the hardware natively understands. Different CPUs use different varieties of machine language. Each machine language instruction performs only one tiny task such as adding two numbers or writing a number to memory.
It is possible to write programs directly in machine language – usually programmers do so by writing in assembly language, which is a textual representation of machine language (which itself consists only of numeric codes representing instructions). But these days almost all programs are written in high-level languages such as Pascal. To use any high-level language, we need either a compiler, which translates the high-level language to machine language, or an interpreter, which reads a program in a high-level language and executes it on the fly, without translating it to machine language.
In this course we will use the Free Pascal compiler.
You will encounter powers of 2 in many places in computer science.
You should learn all of these powers by heart:
20 = 1; 21 = 2; 22 = 4; 23 = 8; 24 = 16; 25 = 32; 26 = 64; 27 = 128; 28 = 256; 29 = 512; 210 = 1024
210 = 1024 which approximately equals 1000. We call this number 1 K (for "kilo"). Similarly
220 = 1 M ("mega") ≈ 1,000,000
230 = 1 G ("giga") ≈ 1,000,000,000
You can quickly convert any power of 2 to an approximate value involving a symbol such as K, M or G. For example:
216 = 26 ⋅ 210 = 64 K
232 = 22 ⋅ 230 = 4 G
At the hardware level, a computer stores a number in binary representation, i.e. in base 2. For example, the number 3710 equals 1001012. Here, the subscript 10 means "base 10", i.e. the ordinary decimal system. The subject 2 means "base 2", i.e. binary. Here is the mathematical meaning of these representations:
3710 = 3 ⋅ 101 + 7 ⋅ 100 = 37
1001012 = 1 ⋅ 25 + 0 ⋅ 24 + 0 ⋅ 23 + 1 ⋅ 22 + 0 ⋅ 21 + 1 ⋅ 20 = 37
Every integer has a unique represention in base 2 (and in any other base).
If you are not familiar with writing numbers in different bases, you should study this topic since we will be using bases extensively in this course. In particular, we will soon write Pascal programs to convert between different bases. The page Numbers in Different Bases at the Oxford Math Center would be a good place to start.
A binary number such as 1001012 consists of a series of binary digits, otherwise known as bits. A bit is a fundamental concept in computer science and is simply a 0 or a 1.
Consider the integers that can be stored in 3 bits. They are
0002 = 0, 0012 = 1, 0102 = 2, 0112 = 3, 1002 = 4, 1012 = 5, 1102 = 6, 1112 = 7
These are the integers from 0 – 7 inclusive. There are 8 such integers, which makes sense since there are 8 possible combinations of 3 bits. That's because are 2 possibilities for each bit, and 2 ⋅ 2 ⋅ 2 = 8.
More generally, with n bits we can represent any non-negative value from 0 to 2n – 1, inclusive. In computer science, an integer value that cannot be negative is known as an unsigned integer.
It is also possible to encode negative numbers in binary. The details of this encoding are beyond the scope of this course. However, you should know that if we are storing a signed integer which can be positive, zero, or negative, then with n bits we can represent any value from -2n-1 to 2n-1-1, inclusive.
The following sizes of integers are especially common in computer programming:
An unsigned 8-bit integer is known as a byte, and can hold values from 0 to 28 – 1, i.e. from 0 to 255.
A signed 16-bit integer can hold values from -215 to 215 – 1, i.e. from -32,768 to 32,767.
A signed 32-bit integer can hold values from -231 to 231 – 1, i.e. from -2,147,483,648 to 2,147,483,647.
At the hardware level, virtually every computer's memory is organized as a series of bytes. We often measure memory in Kb (kilobytes), Mb (megabyte) or Gb (gigabytes). For example, 1 Kb = 1,024 bytes.
Computers generally store text using a coded character set, which assigns a unique number to each of a set of characters. Two character sets are used in virtually all software systems today:
The ASCII character set assigns a 7-bit value (i.e. an integer from 0 to 127) to each character. For example, in ASCII the character 'A' has the number 65, and 'B' has the number 66. ASCII includes all the characters you see on a standard English-language keyboard: the uppercase and lowercase letters A-Z/a-z of the Latin alphabet, the numbers 0-9 and various punctuation marks such as $, % and &. ASCII does not include accented characters such as č or ř.
The newer Unicode character set extends ASCII to include all characters in all languages of the world, including accented characters and also ideographic characters in Asian languages such as 日. Unicode assigns a 21-bit value (i.e. an integer from 0 to 2,097,151) to each character.
Clearly every ASCII character can fit in a byte, and some Unicode characters can only be represented using a sequence of bytes.
Pascal programs can manipulate various types of data. In a Pascal program, every variable, expression and value has a specific type.
We will learn about various Pascal types throughout this course. We will begin with these 5 basic types:
integer
– a 16-bit signed integer or a
32-bit signed integer, depending on whether you have enabled Delphi
mode (see below)
real
- A floating-point number, i.e. a number
such as 2.347868 that can have digits after the decimal point. real
values have a much larger range than integers: for example, a real
can contain the number 1050 . However, real
values will not be precisely accurate when values are very large or
small: they have only 15-16 significant digits (possibly less on
some platforms).
boolean
- either true
or false
.
char
- a single-byte character. Any ASCII
character can fit in a char
; in general Unicode
characters cannot.
string - a sequence of characters. Without Delphi
mode, a string
is limited to 255 characters. If you
enable Delphi mode, a string
may have any length. A
string may contain non-ASCII characters such as ř. Such characters
are represented using multiple bytes in the string.
Here is a first Pascal program that is about as simple as it gets:
begin writeln('hello, world'); end.
To execute this program on your computer, you'll first need to type it into a text editor. You'll then save it to a file with a name such as 'hello.pas'. You can then compile the program and the run the resulting executable. For more information about how to do that, see my page on editing and compiling Pascal programs.
When you run the program, it will produce this output:
hello, world
Note the following in the program above:
The statement to be executed is enclosed by begin
and end
.
The statement is followed by a semicolon (;). We will almost always write a semicolon after every statement in a Pascal program.
There is a period (.) at the end of the program. This period is required.
To enable the Delphi dialect of Free Pascal, include this at the top of your source file:
{$mode delphi}
This has important consequences:
integer represents a 32-bit signed integer rather than a 16-bit signed integer.
string values are not limited to 255 characters.
I recommend that you put this directive at the top of every program. (To save space, however, I will not generally write it at the top of programs in these lecture notes.)
A comment is text that the compiler will ignore. Good programmers often add comments to their code to explain to others (or even to themselves) what a program is doing.
There are three different syntaxes for comments in a Pascal program:
// 1. this is a single-line comment { 2. this is a comment, and may span multiple lines } (* 3. this comment can also span multiple lines *)
Generally I recommend using either syntax 1 or 2 above.
To do much of anything in a Pascal program you will need some variables. A variable is a value of a particular type that can change over time as your program runs.
A variable declaration looks like this:
var x: integer; r: real;
You may declare several variables of the same type together:
var s, t, u: string;
When declaring a single variable, you may give it an intial value:
var a: integer = 4;
An assignment statement puts a new value into a variable. Here is a program that uses some assignment statements:
var x: integer; y: integer; begin x := 4; y := x + 1; x := y + 2; writeln('now x is ', x, ' and y is ', y); end.
When you run this program, initially x and y are undefined because the program does not specify initial values for those variables when it declares them. The first line sets x to 4. The second line sets y to x + 1 = 4 + 1 = 5. Then the third line sets x to y + 2 = 5 + 2 = 7. Note that this line changes the value of x.
The program will print
now x is 7 and y is 5
Note that a single call to writeln
can write out
multiple values. We will use this capability often. Also note that
writeln
can write values of any of the types we have
seen so far (i.e. integer, real, boolean, char, string).
An assignment statement can include the same variable name on the left and right sides. For example, the statement
x := x + 1;
adds one to the existing value of x. If x was 4 before the statement executes, it will be 5 afterward.
The program in the preceding section uses the arithmetic operator +. Pascal includes these arithmetic operators:
+
: addition
-
: subtraction
*
: multiplication
/
: floating-point division
div
: integer division (truncating toward
zero)
mod
: integer remainder
The +, - and * operators can operate on integers or reals. If given two expressions of type integer, these operators will return an integer; otherwise they return a real value.
/ can operate on integers or reals; it always returns a real. That means that this program will not compile:
var x: integer = 4; begin x := x / 2; // ERROR – the / operator returns a real, which cannot fit in an integer variable writeln(x); end.
The div
operator can operate only on integers, and
return an integer. For example, 23 div 3 = 7, since when we divide 23
by 3 we find that the result is 7 excluding the remainder. In other
words, when we truncate the fraction 23/3 toward zero we get the
integer 7.
We can rewrite the program above as follows:
var x: integer = 4; begin x := x div 2; // compiles with no problem writeln(x); end.
The mod
operator can operate only on integers, and
return an integer. As an example, 23 mod 3 = 2, which is the
remainder after dividing 23 by 3.
You can use readln
to read a value from a program's
standard input (which is usually the keyboard). Here is a program
that reads two numbers and prints their sum:
var x, y: integer; begin write('Enter x: '); readln(x); write('Enter y: '); readln(y); writeln('The sum is ', x + y); end.
If we run this program and enter some numbers, the output might look like this:
Enter x: 4 Enter y: 7 The sum is 11
Note that in this program we used write
rather than
writeln
. The difference between these is that writeln
writes a newline character at the end of its output, which
causes the cursor to move to the beginning of the next line. write
does not do this. If we had used writeln
rather than
write
in this program, the output might look like this:
Enter x: 4 Enter y: 7 The sum is 11
Similarly, there are separate functions read
and readln
.
readln
reads one or more values from a single line of
input, and then discards the rest of the input line. read
also reads one or more values, but remains on the same input line so
that a subsequent call to read
(or readln
)
can read more values from that line.
We have seen how to declare variables. It will also sometimes be useful to declare constants:
const Seconds = 60;
The expression in a constant declaration may contain other constants and operators, but no variables:
const Microseconds = Seconds * 1000 * 1000;
The if
statement executes one or more statements if a
condition is true:
if x > 3 then begin y := x; z := x + 1; end;
If there is only one statement to be executed, you may omit the begin
and end
.
An if
statement may optionally include an else
clause indicating one or more statements to be executed if the
condition is false:
if x > 3 then y := x + 1 else begin z := x + 1; y := x – 1; end;
Warning: do not put a semicolon before the else, or the compiler will complain! This is the one place in Pascal where a statement must not be followed by a semicolon.
Here is a program that uses if
to print a message
indicating whether a number is greater than 7.
var x: integer; begin write('Enter x: '); readln(x); if x > 7 then writeln('greater than 7') else writeln('not greater than 7'); end.
The program in the preceding section uses the comparison
operator >
(greater than). Pascal includes these
comparison operators:
= : equal
<> : not equal
< : less than
> : greater than
<= : less than or equal
>= : greater than or equal
These operators can compare any values of any primitive type (boolean, integer, real, char, string). Strings are compared alphabetically.
Note especially the difference between the
assignment operator :=
and the comparison operator =
.
If you want to assign a value to a variable in Pascal, you must use
:=
.
A for…to
statement loops over a series of values.
For each value in the series, it executes the body of the
loop, which consists of one or more statements. On each iteration
of the loop, the loop variable is set to the next value in the
series.
Here is a program that uses for
to print all values
from 1 to 10:
var
i:
integer
;
begin
for
i :=
1
to
10
do
writeln(i);
end
.
You must declare the loop variable (i
in this case) in a
var
section before you can use it in a for
statement. The loop variable may be of type integer
or
char
.
Just as with the if
statement, you can use begin
and end
to enclose multiple statements in the loop body.
For example:
for i := 1 to n do begin writeln(i); writeln(i * 2); end;
To loop through decreasing values, use downto
rather
than to
:
for i := 10 downto 1 do writeln(i); writeln('liftoff');
Let's write a program that reads an integer N such as 5 and prints output such as the following:
1 + 2 + 3 + 4 + 5 = 15
We can accomplish this by using a for
statement to
loop through the values from 1 to N. In the loop body we will perform
two tasks:
Add the current value to the running total, so that at the end of the loop we will know the sum of all the numbers.
Write out the current value.
Here is the program:
var n: integer; i: integer; sum: integer = 0; begin write('Enter n: '); readln(n); for i := 1 to n do begin sum := sum + i; write(i); if i < n then write(' + '); end; writeln(' = ', sum); end.
Study this program and make sure you understand why the if
statement in the loop body is necessary.