Week 7: Notes

The 'var' keyword

When you declare a local variable, you can use the var keyword to tell the compiler to infer its type. For example:

var s = "distant hill";

This is equivalent to

string s = "distant hill";

var does not allow a variable to hold values of any type at all! C# is statically typed, so any assignment to a variable declared with var must match the type that the compiler inferred. For example:

var s = "distant hill";
s = 14;    // error: cannot implicitly convert 'int' to 'string'

I personally don't use var much. I think code is generally easier to read when variable types are explicit.

Linq methods

The System.Linq namespace contains many methods that operate on sequences, i.e. objects that implement IEnumerable<T>. These are extension methods, meaning that they are implemented in the System.Linq namespace rather than in IEnumerable<T> itself. However for the purpose of calling these methods that makes no difference, and you can invoke them directly on any enumerable object. Recall that all the C# collection classes inherit directly or indirectly from IEnumerable<T>, so these methods work with any C# collections such as stacks or dictionaries.

Useful Linq methods include Select(), which maps a function over a sequence, and Where(), which filters a sequence, keeping only the elements for which a given condition is true. Most of these methods return a new sequence, i.e. another IEnumerable<T>. You can call the ToArray() or ToList() methods to gather a sequence's elements into an array or list.

There are many other useful methods such as Concat(), OrderBy(), and Reverse(). You can read about these in our quick reference documentation.

As one example of using the Linq methods, consider Project Euler's problem #1:

Find the sum of all the multiples of 3 or 5 below 1000.

In Python we can solve this in a single line using a list comprehension:

>>> sum([x for x in range(1000) if x % 3 == 0 or x % 5 == 0])
233168

C# does not have list comprehensions, but we can imitate them using the Linq methods. Here is a one-line solution in C#:

int sum = Enumerable.Range(0, 1000).Where(x => x % 3 == 0 || x % 5 == 0).Sum();

Notice the lambda expression x => x % 3 == 0 || x % 5 == 0 in this code.

The C# method Enumerable.Range() is similar to Python's range() function: it can produce an enumeration of integers. However, the arguments are slightly different: Enumerable.Range(x, y) produces an enumeration of y integers starting at x, which is like range(x, x + y) in Python.

most common words in a document

More commonly you will use the Linq methods in conjunction with the C# collection classes. For example, suppose that we want to find the most common words in War and Peace, considering only words that are at least 5 characters long. Here is a solution in C# using Linq:

using System.Text.RegularExpressions;

Dictionary<string, int> count = [];

foreach (string line in File.ReadLines("war_and_peace.txt"))
    foreach (Match match in Regex.Matches(line, "[A-Za-z]+")) {
        string word = match.Value.ToLower();
        count[word] = count.GetValueOrDefault(word) + 1;
    }

foreach (KeyValuePair<string, int> p in
            count.Where(kv => kv.Key.Length >= 5)
                 .OrderByDescending(kv => kv.Value).Take(20))
    Console.WriteLine($"{p.Key} {p.Value}");

Notice several things about this program:

We can easily use regular expressions (which we studied in Programming 1) in C#. In particular, Regex.Matches() returns a list of match objects, each representing a match of a regular expression. If m is a match object, then m.Value is the text that matched. Our quick reference guide has more detailed information about regular expression methods in C#.
The GetValueOrDefault() method returns either the value for a dictionary key, or a default value (which is 0 for the int type) if the key is not present.
A dictionary in C# is a collection of KeyValuePair objects. And so if you iterate over a dictionary with foreach you will get a KeyValuePair on each iteration. (That's unlike Python, where iterating over a dictionary produces only the keys.) Similarly, if you invoke a Linq method such as Where() or OrderByDescending() on a dictionary you are processing a series of KeyValuePair objects.
You can pass a delegate to OrderBy() or OrderByDescending(). That is like calling sort() with a key function in Python.
The Linq method Take(n) takes only the first n elements of an enumeration. That is convenient if you want to produce only a limited amount of output.

Linq syntax

C# includes special syntax for calling some of the Linq methods. For example, instead of

int[] b = a.Where(i => i > 10).Select(i => i + 10).ToArray();

we can write

int[] b = (from i in a where i > 10 select i + 10).ToArray();

This syntax is similar to SQL, a database query language. We will not study the Linq syntax more in this course, but you are welcome to use it if you like.

solving combinatorial problems with recursion

We can use recursion to solve many problems of a combinatorial nature, such as generating subsets, combinations, or permutations of a set. As we solve these problems we can use recursion to explore a tree of possibilities.

As a first example, suppose that we'd like to write a function that takes an integer n and prints out all possible strings of length n containing only characters from the set {'a', 'b', 'c'}. For example, if n = 2 then we will print these strings:

aa
ab
ac
ba
bb
bc
ca
cb
cc

And if n = 3:

aaa
aab
aac
aba
abb
abc
aca
acb
acc
baa
bab
bac
…

There will be 3ⁿ strings in the output list. That's because there are 3 possible choices for the first letter, 3 for the second letter, and so on for each of the n letters.

We can imagine a series of choices that we must make as we choose a string to print out. We can draw these in a tree. For example, if n = 3:

At each step, we choose the letter 'a', 'b' or 'c'. Affter we have made N choices, we have a string that we can print.

We can use recursion to descend this tree. At each level of the tree, we'll make three recursive calls, one for each choice that we can make (a, b, or c). As we descend the tree, we will remember all choices that we have made so far. We can accomplish that in several different ways, each of which will lead to a slightly different solution. As we'll see below, a non-recurisve solution to this problem is also possible.

Solution 1: using a prefix parameter

As the recursion descends, we can accumulate choices into a prefix string that we pass to each recursive call. When we have made N choices, we have reached a leaf of the tree and will print out the string we have built.

Here is our solution:

void print(string s) => Console.WriteLine(s);

void abc(int n) {
    void go(string prefix) {
        if (prefix.Length == n)
            print(prefix);
        else
            foreach (char c in "abc")
                go(prefix + c);
    }
    go("");
}

Solution 2: using a stack

As another possibility, as the recursion descends we can push all choices we've made so far onto a stack. Once we have made N choices, we can print out the values on the stack. Each time the recursion returns, we pop our last choice off the stack, restoring its previous state. This solution might look like this:

void abc2(int n) {
    Stack<char> stack = new();

    void go(int i) {
        if (i == n)
            print(string.Join("", stack.Reverse()));
        else
            foreach (char c in "abc") {
                stack.Push(c);
                go(i + 1);
                stack.Pop();
            }
    }

    go(0);
}

We call the Linq method Reverse() above so that we print out characters in the order we chose them. For example, if we choose 'b', then 'a', then 'c', then iterating over the stack will produce the values 'c', 'a', and 'b' in that order. Reversing produces the order 'b', 'a', 'c'.

In solution 1 above, we never have to undo a choice that we have made, because the expression 'prefix + c' creates a copy of the prefix string with the additional character c. The original prefix string remains unchanged (as it must be, since strings are immutable in C#). However, this stack-based solution may be slightly more efficient because it never copies the set of choices accumulated so far.

When we study larger problems that explore an exponential state space, we will often see this same choice: we may either (a) copy the state each time we recurse, or (b) modify the state before recursing, then undo the change afterwards.

solution 3: using an array

As yet another possibility, we can store all choices in an array. Our solution will look like this:

void abc3(int n) {
    char[] a = new char[n];

    void go(int i) {
        if (i == n)
            print(string.Join("", a));
        else
            foreach (char c in "abc") {
                a[i] = c;
                go(i + 1);
            }
    }

    go(0);
}

This approach works nicely for this problem because all solutions have the same number of elements. (For many combinatorial problems that will not be true; for example, see the subsets problem below.)

solution 4: extracting digits

We can actually solve this particular problem non-recursively, by extracting digits from an integer. To see how this will be possible, suppose that instead of the characters 'a', 'b' and 'c' we want to generate the digits 0, 1, and 2. Then our output for N = 3 will look like this:

This is simply all 3-digit numbers in base 3!

For a given N, there are 3^N N-digit numbers in base 3, which is exactly the number of possible solutions that we saw before. So we may simply loop through integers from 0 to 3^N – 1, extract a series of base-3 digits from each, and convert each such digit to a character 'a', 'b' or 'c'. Here is a possible solution:

void abc4(int n) {
    string convert(int i) {
        string s = "";
        for (int k = 0 ; k < n ; ++k) {
            s = "abc"[i % 3] + s;
            i = i / 3;
        }
        return s;
    }

    int p = (int) Math.Pow(3, n);

    for (int i = 0 ; i < p ; ++i)
        print(convert(i));
}

subsets of a set

Another typical combinatorial problem is generating all subsets of a set. Specifically, suppose that for any N we'd like to generate all subsets of the set {1, 2, …, N}. For N = 3, our output might look like this (in some order):

{ }
{ 1 }
{ 2 }
{ 1 2 }
{ 3 }
{ 1 3 }
{ 2 3 }
{ 1 2 3 }

One way to view this problem is as a series of N choices, one for each of the values {1, 2, …, N}. In each choice, we decide whether a certain value will be absent from the set or present in it. Each of these N choices is a binary decision, and so there will be 2^N possible solutions.

We can solve this problem using any of the techniques that we saw above for the abc problem. Let's write a solution using a prefix parameter:

void subsets(int n) {
    void go(string prefix, int i) {
        if (i > n)
            print("{" + prefix + "}");
        else {
            go(prefix + i + " ", i + 1);   // i is in the subset
            go(prefix, i + 1);             // i is not in the subset
        }
    }

    go(" ", 1);
}

Our decisions are binary, so our recursive function calls itself twice, once for each possible choice.

Let's run it:

subsets(3);

This will produce this output:

{ 1 2 3 }
{ 1 2 }
{ 1 3 }
{ 1 }
{ 2 3 }
{ 2 }
{ 3 }
{ }

As another possible approach, at every recursive step we can either choose the next value that will be in the subset, or decide that there will be no more values and so the subset is complete. That leads to a solution that looks like this:

void subsets2(int n) {
    void go(string prefix, int i) {
        print("{" + prefix + "}");  // add no more values
        for (int j = i ; j <= n ; ++j)
            go(prefix + j + " ", j + 1);  // add value j and recurse
    }

    go(" ", 1);
}

Let's run it:

subsets(3);

This will produce the same subsets as the previous solution, but in a somewhat different order:

{ }
{ 1 }
{ 1 2 }
{ 1 2 3 }
{ 1 3 }
{ 2 }
{ 2 3 }
{ 3 }

coin sums

As another example of a combinatorial problem, consider Czech coins, which have the following denominations: 1 Kč, 2 Kč, 5 Kč, 10 Kč, 20 Kč, 50 Kč. Suppose that we'd like to print out all possible ways of forming a sum N using Czech coins. For the moment, let's assume that order matters, e.g. 1 + 2 and 2 + 1 are distinct sums.

We can solve this problem using recursion. Here is a possible solution, using a stack to remember the choices we have made:

int[] coins = { 1, 2, 5, 10, 20, 50 };

void sum(int n) {
    Stack<int> nums = [];

    // Add coins to make the remaining sum (rem).
    void go(int rem) {
        if (rem == 0)
            print(string.Join(" + ", nums.Reverse()));
        else
            foreach (int c in coins)
                if (c <= rem) {
                    nums.Push(c);
                    go(rem - c);
                    nums.Pop();
                }
    }

    go(n);
}

Our recursive helper function go() takes an integer rem indicating the remaining sum that we need to form. If rem is 0, we have found a solution, so we print out the string. Otherwise we recurse once for every possible coin size that is less than or equal to rem, subtracting it from the remaining sum that we need to form.

The call sum(5) will produce this output:

1 + 1 + 1 + 1 + 1
1 + 1 + 1 + 2
1 + 1 + 2 + 1
1 + 2 + 1 + 1
1 + 2 + 2
2 + 1 + 1 + 1
2 + 1 + 2
2 + 2 + 1
5