Week 6: Notes

Object class

All C# class automatically inherit from a top-level class called Object which contains a modest number of methods that are shared by all objects. One of these methods is ToString:

virtual string ToString ();

C# automatically calls this method when it needs to convert any object to a string, for example when writing an object to the console:

Window w = new Window();

Console.WriteLine(w);   // will automatically call w.ToString()

The default implementation of ToString simply returns the name of the class, so the above code will write

Window

You can override ToString() in your own classes if you'd like to change their printed representation. (ToString() is like the __repr magic method in Python, which you may remember.)

Object contains a couple of other methods (Equals, GetHashCode) that we will consider when we discuss the C# collection classes below.

interfaces

In Introduction to Algorithms we learned about various abstract data types, such as stacks and queues. For each abstract data type, we described the methods that it provides. For example, a stack has methods push(), pop() and isEmpty(). We saw that we can implement each abstract data type in various ways. For example, we can implement a stack either using a linked list, or an array. Different implementations of an abstract data type will have equivalent functionality, though they might have different performance characteristics.

C# interfaces formalize this concept. An interface defines a set of methods or other members in an abstract data type. For example, here is an interface for a stack of integers:

interface Stack {
  bool isEmpty();
  void push(int i);
  int pop();
}

There is no implementation yet - just a set of methods to be implemented. We can write a concrete class LinkedStack that implements a stack using a linked list. It might begin like this:

class LinkedStack : Stack {
  Node head;

  public bool isEmpty() { return (head == null); }
  
  public void push(int i) {
    head = new Node(i, head);
  }
  
  public int pop() { … }
}

Above, "class LinkedStack : Stack" means that we are defining a class LinkedStack that implements the Stack interface. The compiler will check that our class provides an implementation of every method that was declared in the interface. If any implementations are missing, we will get a compiler error.

We might then write a second class ArrayStack that implements a stack using a fixed-size array. It could begin like this:

class ArrayStack : Stack {
  int[] a;
  int num;  // number of elements currently on the stack

  public ArrayStack(int maxSize) {
    a = new int[maxSize];
  }

  public bool isEmpty() { return num == 0; }

  …
}

We now have two different concrete implementations of a Stack. We might create two Stack objects as follows:

Stack s = new LinkedStack();
Stack t = new ArrayStack(100);

Notice that a variable can have an interface type, such as the variables s and t above. A variable with an interface type can hold any object that implements that interface.

Method parameters can also have an interface type. For example, here is a method that will empty a Stack, printing all values as it does so:

static void empty(Stack s) {
  while (!s.isEmpty()) {
    int i = s.pop();
    WriteLine(i);
  }
}

We may pass either a LinkedStack or an ArrayStack to this method. For example:

ArrayStack as = new ArrayStack(100);
for (int i = 0 ; i < 5 ; ++i)
  as.push(i);
empty(as);

Notice that an interface may not contain a constructor. Any class that implements an interface may have its own constructor(s), but those are not part of the interface.

An interface may, however, contain properties and indexers. For example, let's modify the Stack interface so that isEmpty is a read-only property, not a method:

interface Stack {
  bool isEmpty { get; }
  void push(int i);
  int pop();
}

All members in an interface are implicitly public, so you should not write the keyword public in an interface declaration. However, when you write a class that implements an interface, all implementations of the interface's members must be declared public (as you can see in the examples above).

Note that a class may implement more than one interface. For example, let's write an interface Countable for any type that has a current element count:

interface Countable {

  int count { get; }
}

Now we can modify our ArrayStack class so that it implements both Stack and Countable:

class ArrayStack : Stack, Countable {
  int[] a;
  int num;  // number of elements currently on the stack

  public ArrayStack(int maxSize) {
    a = new int[maxSize];
  }

  public bool isEmpty => (num == 0);  // implementation of property from Stack

  public int count => num;  // implementation of property from Countable

  …

}

interface inheritance

Up to now we have only discussed inheritance between classes. C# also allows an interface to inherit from another interface. An inheriting interface has all the members of its parent interface, and can add additional members of its own. Here is a somewhat abstract example:

interface Shape {
  double perimeter();
  double area();
}

interface Polygon : Shape {
  int num_vertices { get; }
}

class Circle : Shape { … }

class Rectangle : Polygon { … }

class Triangle : Polygon { … }

In this example, a Shape represents a shape in two dimensions and a Polygon is a Shape that has a finite number of vertices. Because the Triangle class implements Polygon, it must provide an implementation of num_vertices as well as of the perimeter() and area() methods.

single and multiple inheritance

C# provides

single class inheritance: a class may inherit from only one other class
multiple interface inheritance: an interface may inherit from multiple interfaces

This particular combination was popularized by Java. Many object-oriented languages (including Python) have a more general mechanism: they have multiple class inheritance, so that one class may have several immediate superclasses. That is more powerful, but creates various complexities that C# avoids through its simpler model.

generic classes

We have now learned enough C# that we can implement data structures such as stacks, queues, and binary trees. However, in the subset of C# that we know so far, each of our implementations will be tied to some specific type. For example, we can write a stack class that holds integers:

class IntStack {
  int[] a;
  int num;  // number of elements currently on the stack

  public IntStack(int maxSize) {
    a = new int[maxSize];
  }
}

Similarly, we can write a stack that holds strings:

class StringStack {
  string[] a;
  int num;  // number of elements currently on the stack

  public StringStack(int maxSize) {
    a = new string[maxSize];
  }
}

But this is a significant limitation - we certainly don't want to have to rewrite our data structure classes once for every possible element type!

In Python last semester, we had no such issue because Python is dynamically typed. When we wrote a stack class in Python, we could add anything at all to it:

# Python code
s = new Stack()

s.push(14)

s.push('hello')

s.push([1, 2, 3])

C# is different: it is a statically typed language, meaning that every variable has a type. And so C# provides a different mechanism called generic classes. Generic classes let us write a container class only once and reuse its implementation for every possible element type.

A generic class has one or more type parameters. For example, we can implement a class called FixedStack<T>. This means "a fixed-size stack of objects of type T". Our class might look like this:

class FixedStack<T> {
  T[] a;
  int num;  // number of elements currently on the stack

  public FixedStack(int maxSize) {
    a = new T[maxSize];
  }

  public void push(T val) {
    a[num] = val;
    num += 1;
  }

  public T pop() {
    num -= 1;
    T ret = a[num];
    return ret;
  }
}

In the class above, T is a type parameter. When the caller creates a FixedStack, they will specify the type T:

FixedStack<int> s = new FixedStack<int>(100);  // create a stack of ints
s.push(15);
s.push(25);

FixedStack<string> t = new FixedStack<string>(100);  // create a stack of strings
t.push("hello");
t.push("goodbye");

Notice that inside the class FixedStack<T>, the type T can be used like any other type: we can create an array of objects of type T, or use T as a method parameter type, a method return type or a variable type.

Observer that a FixedStack is not quite like our stack implementation in Python, where a single stack could hold objects of various types. In C#, by contrast, each FixedStack is tied to some particular element type. That is a good thing: if we create a FixedStack<int>, intending to hold ints, and then attempt to push a string to it by mistake, we will get a compile-time error.

By the way, the word T above is arbitrary: a type parameter can have any name you like. It is traditional, however, in C# (and other languages with generics) to use "T" for classes that take a single type parameter.

multiple type parameters

A single class may take multiple type parameters. For example, suppose that we want to write a Dictionary class that maps keys to values using a hash table. It might look like this:

class Dictionary<K, V> {
  …

  public void add(K key, V val) { … }

  public bool contains(K key) { … }
}

Here, K and V are two type parameters. When the caller creates a Dictionary, they will specify types for K and V:

Dictionary<int, string> d = new Dictionary<int, string>();   // maps int → string
d.add(10, "sky");
d.add(20, "purple");

Dictionary<string, double> e = new Dictionary<string, double>();  // maps string → double
e.add("purple", 55.2);

Let's consider how we could implement such a class. As we learned in Intro to Algorithms, a hash table contains an array of hash chains, each of which consists of a linked list of nodes. So we will need a Node class that is itself generic:

class Node<K, V> {
  public K key;
  public V val;
  public Node<K, V> next;

  public Node(K key, V val) {
    this.key = key; this.val = val;
  }
}

Notice that inside a generic class such as Node, we must specify the type parameters <K, V> when we refer to the class itself (e.g in the field declaration Node<K, V> next), but the constructor declaration does not include the type parameters.

Now, inside the Dictionary class we could have a field that holds an array of hash chains of appropriate type:

class Dictionary<K, V> {
  Node<K, V> a[];
}

generic interfaces

We discussed generic classes above. An interface may also be generic. For example, let's revisit the Stack interface we wrote before, which described a stack of ints:

interface Stack {
  bool isEmpty { get; }
  void push(int i);
  int pop();
}

We will now modify it to be a generic interface Stack<T>, which is a stack of elements of type T:

interface Stack<T> {
  bool isEmpty { get; }
  void push(T i);
  T pop();
}

Above, we wrote a class FixedStack<T>, implementing a fixed-size stack of elements of type T. We can now modify that class so that it implements the Stack<T> interface:

class FixedStack<T> : Stack<T> {
  T[] a;
  int num;  // number of elements currently on the stack
… 
}

Once again, a variable may have an interface type:

Stack<double> s = new FixedStack<double>(100);

s.push(2.0);
s.push(4.0);

IComparable

Many types are naturally ordered: for any two values, we can say whether one is greater or less than the other. For example, we can compare integers or doubles in the obvious fashion, or can compare strings alphabetically. For other types, an ordering may not be so obvious: for example, does it make sense to say that one stack is greater or less than another? Perhaps not.

The C# standard library contains a generic interface IComparable<T> that is implemented by all ordered built-in types. IComparable<T> means "comparable with objects of type T". For example, the built-in type int implements IComparable<int>, since integers are comparable with integers. Similarly, string implements IComparable<string>.

IComparable<T> is defined like this:

interface IComparable<T> {
  int CompareTo (T other);
}

The CompareTo method returns

a negative number if this object is less than other
0 if this object equals other
a positive number if this object is greater than other

For example:

WriteLine(4.CompareTo(7));  // writes -1, since 4 < 7

IComparable<T> is useful because it enables us to write generic classes that compare their elements, as we will see in the next section.

generic class constraints

The generic class FixedStack<T> that we wrote above treats its elements as opaque values: it stores them, but does not know anything about their behavior.

Now suppose that we want to write a class MaxStack<T> which holds a stack of values of type T, and also provides a method T max() that returns the largest value currently on the stack. You might think that we could implement the class like this:

class MaxStack<T> {
  T[] a;
  int num;

  …

  T max() {
    T v = a[0];

    // Scan the stack, looking for the largest element.
    for (int i = 1 ; i < num ; ++i)
       if (a[i] > v)  // we found a larger element
          v = a[i];  // remember it
    return v;
  }
}

This code will not compile. The problem is that we can't use the > operator to compare two values of type T, because T might be some type that is not ordered – for example, T might be an array type, and arrays cannot be compared in C#.

So we will need to modify MaxStack so that it's only possible to create a MaxStack<T> if T is an ordered type. C# provides constraints for this purpose. As we saw in the previous section, any ordered type implements the IComparable<T> interface. So we can add a constraint to MaxStack<T> that says that T must implement IComparable<T>:

class MaxStack<T> where T : IComparable<T> {
  …
}

And now in the class we can implement the max() method like this:

  T max() {
    T v = a[0];

    // Scan the stack, looking for the largest element.
    for (int i = 1 ; i < num ; ++i)
       if (a[i].CompareTo(v) > 0)  // we found a larger element
          v = a[i];  // remember it
    return v;
  }

Notice that even if T is constrained to implement IComparable<T>, we still cannot use the < operator to compare two elements of type T – instead, we must call CompareTo(). In fact, in C# you may not ever use comparison operators to compare values of generic types, even the == operator! However, if T is known to implement IComparable<T> then you can test whether two values of type T are identical by calling CompareTo() and checking for the return value 0. It would be nice if C# allowed comparison operators in this context (and some other languages do, e.g. Kotlin), but it does not.

generic methods

So far we have used generics only to write entire classes. C# also allows us to write generic methods that take one or more type parameters, even outside a generic class. Occasionally this is useful. For example, suppose that we want to write a static method that swaps two values of any type T. We can write this as as a generic method:

public static void swap<T>(ref T a, ref T b) {
    T t = a;
    a = b;
    b = t;
  }

default

Suppose that we want to write a generic array class ExtArray<T> that holds elements of type T and has a special behavior: if the caller attempts to read an element that is out of bounds, then instead of an exception they will receive a default value in return. We would like this default value to be the "natural" default for the element type; for example, an ExtArray<int> should return 0 if an access is out of bounds, but an ExtArray<bool> should return false.

C# provides an operator default that provides this. For any type T, default(T) returns C#'s default value for that type:

    WriteLine(default(int));   // writes 0
    WriteLine(default(bool));   // writes false

Now we can implement ExtArray<T> as follows:

class ExtArray<T> {
  T[] a;
  int count;

  …

  public T this[int index] {
    // return default value if out of bounds
    get => index < count ? a[index] : default(T);

    set => …
  }
}

The C# collection classes

LIke many object-oriented languages, C# contains a set of collection classes in its standard library. These classes implement common data structures such as stacks, queues and hash tables. They are generic and make use of inheritance.

Here is a picture of the class hierarchy of some of the major collection classes and interfaces:

All of these classes and interfaces live in the namespace System.Collections.Generic.

In the picture, an arrow from A to B means that B implements or inherits from A.

All entities above whose names begin with the letter I (e..g ICollection, IDictionary) are interfaces. This is a standard naming convention in the C# standard library.

The picture above includes the following classes:

List<T> is a dynamic array, similar to a list in Python. List<T> contains an add() method for appending an element to the end of a List. It also contains an indexer so that you can get or set elements by index.
Stack<T> implements a stack using a dynamic array. It includes methods Push() and Pop().
Queue<T> implements a queue using a dynamic array, and includes methods Enqueue() and Dequeue().
The classes HashSet<T> and SortedSet<T> implement sets, with methods Add(), Contains() and Remove(). They differ in their underlying data structure: HashSet<> uses a hash table and SortedSet<T> uses a balanced binary tree.
Finally, the classes Dictionary<K, V> and SortedDictionary<K, V> implement dictionaries, with an indexer that can get/set values as well as a ContainsKey method. They differ in their underlying data structure: Dictionary<K, V> uses a hash table and SortedDictionary<K, V> uses a balanced binary tree.

You can read more details about these classes and interfaces in our Quick Library Reference. Note that you will need to follow the class hierarchy in order to find all the members available in each of these classes. For example, the methods available in List<T> include not only the ones listed under List<T> in the library reference, but also those listed under IList<T> and ICollection<T>, since List<T> implements those interfaces.

You can read even more details about the collection classes (including various classes not depicted above) in the official .NET API documentation.

Notice that all collection classes ultimately inherit from an interface IEnumerable<T>. This interface contains various methods that allow the foreach statement to iterate over a collection's values. We will not discuss these methods at this point, but you should certainly be aware that you can use foreach to iterate over any collection.

You will want to become fluent in using these collection classes. With them, you finally have easy access to the same useful data structures that we often used in Python, i.e. lists, sets and dictionaries.

Dictionaries in particular are quite useful, so let's look at the Dictionary<K, V> class a bit more. As you can see in the documentation, Dictionary<K, V> implements the interface IDictionary<K, V>, which in turn implements the interface ICollection<KeyValuePair<K, V>>. This means that a dictionary is a collection of key-value pairs. You can use foreach to iterate over a dictionary, and each element you get back will be a KeyValuePair<K, V>. Each KeyValuePair in turn has properties Key and Value. For example:

    Dictionary<string, int> d = new Dictionary<string, int>();
    d["yellow"] = 10;
    d["red"] = 15;
    d["blue"] = 20;
    
    foreach (KeyValuePair<string, int> p in d)
        WriteLine($"key = {p.Key}, value = {p.Value}");

overriding Equals and GetHashCode

Suppose that we have a class Vector representing a vector in an arbitrary number of dimensions:

class Vector {
  double[] a;
  
  public Vector(params double[] a) { this.a = a; }
}

We could certainly add more methods to obtain the length of a vector, add two vectors, and so on, but those are not important for our discussion here.

Let's create a HashSet<Vector> and add a couple of vectors to it:

HashSet<Vector> s = new HashSet<Vector>();
s.Add(new Vector(1.0, 2.0, 3.0));
s.Add(new Vector(3.0, 4.0, 5.0));

OK – so far so good. Now let's test whether the vector (1.0 2.0 3.0) is in the set:

WriteLine(s.Contains(new Vector(1.0, 2.0, 3.0)));
==> False

Contains has reported that the set does not contain that vector! The problem is that the HashSet does not consider the first vector we added to be equal to the vector we passed to Contains(), despite the fact that they contain the same numbers. So how can we tell HashSet that they should be equal?

As we briefly mentioned above, the top-level Object class contains a method Equals:

virtual bool Equals (object obj);  // return true if this object equals (obj)

HashSet calls this method to determine whether two objects are equal. By default, Equals just tests for reference equality, i.e. it returns true only if two objects are actually one and the same object. But we can override Equals in our class to specify a different notion of equality:

public override bool Equals(object o) {
    if (!(o is Vector))
      return false;

    Vector w = (Vector) o;
    
    if (a.Length != w.a.Length)
      return false;
      
    for (int i = 0 ; i < a.Length ; ++i)
      if (a[i] != w.a[i])
        return false;
    
    return true;
  }

Our comparison method first returns false if o is not actually a Vector. (This is a sneak preview of the is operator, which will we will study next week). Assuming that it is, we use a type cast to convert to a Vector, and then compare the vectors element by element.

Now the HashSet class should consider any two vectors with the same components to be identical. However overriding Equals is not sufficient. In fact, it we build the code above then the compiler warns us that we have another task to complete:

prog.cs(4,7): warning CS0659: 'Vector' overrides Object.Equals(object o) but does not override Object.GetHashCode()

As the compiler has suggested, we should also override the GetHashCode() method, which also belongs to the top-level Object class, and has this signature:

virtual int GetHashCode ();

Why must we override GetHashCode() whenever we override Equals()? The default implementation of GetHashCode() computes hash codes that might be different for any two different objects. But if two objects are considered equal by Equals(), such as two different Vector objects that have identical components, then they must have the same hash code so that a hash table implementation (such as HashSet) will look for them in the same hash bucket. This should make sense to you based on our study of hash tables in Intro to Algorithms. (And in fact you may remember that we encountered this same phenomenon in Python, where we needed to implement a magic method __hash() if we implemented a custom equality operator for a class.)

So let's expand our Equals class with a GetHashCode() method. For an aggregate object such as a Vector, the easiest approach is to combine the hash codes of the underlying values in some way. For example:

  public override int GetHashCode() {
    long hash = 0;
    
    foreach (double d in a)
      hash = hash * 1000003 + d.GetHashCode();
    return (int) hash;
  }

(You might recognize the prime constant 1,000,003 above from our discussion of hash functions in Introduction to Algorithms last semester.)

Now there are no compiler warnings, and looking up a Vector in a HashSet has the expected result:

HashSet<Vector> s = new HashSet<Vector>();
s.Add(new Vector(1.0, 2.0, 3.0));
s.Add(new Vector(3.0, 4.0, 5.0));
   
WriteLine(s.Contains(new Vector(1.0, 2.0, 3.0)));

==> True