Lecture 8

Here are notes about the topics we covered in Lecture 8. For more details, see the Essential C# textbook or the C# reference pages.

The Equals() and GetHashCode() methods

The top-level object class contains a method Equals():

  virtual bool Equals (object obj);

This method is distinct from the == operator. The default behavior of Equals and == is as follows:

for classes: both Equals and == test reference equality, i.e. they return true only if two objects are actually the same object
for structs:
- Equals tests structural equality: it returns true if two objects have the same type and their corresponding fields are equal
- == is not defined; an attempt to use it will result in a compiler error

When you write a class or struct, you can override the Equals method for your type and can also provide an overloaded == operator. In theory these could have different behavior, which is potentially confusing. I recommend that if you provide a custom implementation of Equals for your type, you should also customize == to behave in the same way, and vice versa. (In fact, if you customize ==, the compiler will require you to write a custom version of Equals as well.)

object also contains a method GetHashCode():

  virtual int GetHashCode ();

If you override Equals for your type, you should also override GetHashCode, ensuring that two equal values will always have the same hash code. This will ensure that your type will work correctly as a hash table key. (In fact, if you override Equals the compiler will require you to override GetHashCode as well.)

Here is a partial implementation of a big number class with its own implementation of Equals, == and GetHashCode:

 class BigNum {
  int[] digits;
  
  public static bool operator == (BigNum b, BigNum c) {
   // assuming no leading zeroes
    if (b.digits.Length != c.digits.Length)
      return false;
      
    for (int i = 0 ; i < b.digits.Length ; ++i)
      if (b.digits[i] != c.digits[i])
        return false;
    
    return true;
  }
  
  public static bool operator != (BigNum b, BigNum c) => !(b == c);
  
  public override bool Equals(object o) => (o is BigNum n) && (this == n);

  // calculate (this mod 2^32)
  public override int GetHashCode() {
    int h = 0;
    
    foreach (int d in digits)
      h = 10 * h + d;
      
    return h;
  }
}

generic methods

A method may be generic: it may take one or more type parameters. For example:

public static void swap<T>(ref T a, ref T b) {
    T t = a;
    a = b;
    b = t;
  }

public static void fill<T>(T[] a, T t) {
  for (int i = 0 ; i < a.Length ; ++i)
    a[i] = t;
}

generic classes and interfaces

A class or interface may also be generic. Here's a generic version of our dynamic array class:

class DynArray<T> {
  T[] a = new T[10];
  int count;
  
  public int length {
    get => count;
  }
  
  public void add(T t) {
    if (count == a.Length) {
      T[] b = new T[count * 2];
      for (int j = 0 ; j < count ; ++j)
        b[j] = a[j];
      a = b;
    }
    
    a[count++] = t;
  }
  
  public T this[int index] {
    get => count < index ? a[index] : default(T);
    set => a[index] = value;
  }
  
  public bool contains(T t) {
    foreach (T u in a)
      if (u.Equals(t))
        return true;
        
    return false;
  }
}

Note that the contains method above uses the Equals method to compare two values. It cannot use ==, since the == operator is not defined for every type T.

Since DynArray is generic, we can instantiate it with any type we want. For example:

   DynArray<double> a = new DynArray<double>();
   a.add(3.0);
   a.add(4.0);

   DynArray<string> b = new DynArray<string>();
   b.add("yo");

The 'default' operator

The default operator returns the default value for a type:

    WriteLine(default(int));   // writes 0

default is most useful inside a generic class, where it can act on a type parameter. In the DynArray class above, the indexer uses default to return a type's default value if the index is out of bounds.

multiple type parameters

A generic method, class or interface may have multiple type parameters. Here is an interface type for a map from any type to any type:

interface Map<K, V> {
  V this[K key] { get; set; }
}

Here is a class that implements the Map<K, V> interface naively using a dynamic array of key/value pairs:

class ArrayMap<K, V> : Map<K, V> {
  struct Pair {
    public readonly K key;
    public readonly V val;
    public Pair(K key, V val) { this.key = key; this.val = val; }
  }
  DynArray<Pair> a = new DynArray<Pair>();
  
  int? find(K key) {
    for (int i = 0 ; i < a.length ; ++i)
      if (a[i].key.Equals(key)) return i;
    return null;
  }
  
  public V this[K key] {
    get {
      if (find(key) is int i)
        return a[i].val;
      throw new KeyNotFoundException();
    }
    
    set {
      if (find(key) is int i)
        a[i] = new Pair(key, value);
      else a.add(new Pair(key, value));
    }
  }
}

generic constraints

A generic type parameter may include constraints. Here is a method that copies values from one array to another, using a constraint to ensure that the arrays have compatible types:

public static void copy<T, U>(T[] a, U[] b)
    where T : U {
    for (int i = 0 ; i < a.Length ; ++i)
      b[i] = a[i];
  }

Each type constraint can have one of the following forms:

T : type – T must be a subtype of the given type
T : struct – T must be a value type
T : class – T must be a reference type

comparable objects

Commonly we use a constraint to ensure that a generic type has a built-in ordering, i.e. that it implements the built-in IComparable<T> interface. This interface has a single method:

    int CompareTo (T val);

The method returns

a negative value if this object precedes val in the built-in ordering
0 if this object equals val
a positive value if this follows val in the built-in ordering

The built-in types int, double and string all implement IComparable<T>, for example.

Here is a generic method that returns the largest value in an array of any type, using that type's built-in ordering:

public static T max<T>(T[] a)
                where T : IComparable<T> {
  T m = a[0];
  for (int i = 1 ; i < a.Length ; ++i)
    if (a[i].CompareTo(m) > 0)
      m = a[i];
  return m;
}

Here is a class that can accomplish the same thing. After it receives a series of values via the add method, the max property will contain the largest of the values.

class Maximizer<T> where T : IComparable<T> {
  T _max;
  bool empty;
  
  public void add(T t) {
    if (empty || t.CompareTo(_max) > 0)
      _max = t;
    empty = false;
  }
  
  public T max { get => _max; }
}

comparers

Sometimes we'd like to compare objects using an ordering that is different from their type's built-in ordering. For example, we might like to compare strings not by lexicographic order, but by length.

A comparer is an object that can compare two values of a given type. It implements the built-in IComparer interface, which has a single method:

    int Compare (T x, T y);

The method returns

a negative value if x is less than y
zero if x equals y
a positive value if x is greater than y

Here is the Maximizer class from above, rewritten to use a comparer. Note that it no longer has a generic type constraint:

class Maximizer2<T> {
  IComparer<T> comparer;
  T _max;
  bool empty;
  
  public Maximizer2(IComparer<T> comparer) {
    this.comparer = comparer;
  }
  
  public void add(T t) {
    if (empty || comparer.Compare(t, _max) > 0)
      _max = t;
    empty = false;
  }

  public T max { get => _max; }
}

Enumerable objects and enumerators

An enumerator implements the built-in IEnumerator<T> interface, which represents a stream of objects of type T. It is like the IntStream interface we saw a few lectures ago, but uses a generic type.

IEnumerator has several methods and properties. The most important are

    T Current { get; }

Return the current value in the enumeration. You must call MoveNext once before retrieving the first value!

    bool MoveNext ();

Advance to the next value in the enumeration. Returns false if there are no more elements.

An enumerable object implements the built-in interface IEnumerable<T>, which represents any object that can provide an enumerator. This interface has a couple of methods; the important one is

    IEnumerator<T> GetEnumerator ();

Return an IEnumerator that can traverse all elements in this IEnumerable.

You can only traverse an enumerator once. An enumerable object, however, can be traversed many times; each time, the caller will call GetEnumerator to retrieve an IEnumerator for the traversal.

Enumerable objects are important because

the built-in foreach statement can iterate over any enumerable object
all built-in collection classes are enumerable

Unfortunately it's a bit of a bother to implement an enumerable object, since you must implement a fair number of methods. For completeness, here's an implementation of a class Range that represents a range of integers and is enumerable:

class RangeEnumerator : IEnumerator<int> {
  int i, end;
  
  public RangeEnumerator(int start, int end) {
    this.i = start - 1; this.end = end;
  }
  
  public int Current { get => i; }
  
  object IEnumerator.Current { get => Current; }
  
  public bool MoveNext() => ++i <= end;
  
  public void Reset() => throw new NotSupportedException();
  
  public void Dispose() { }
}

class Range : IEnumerable<int> {
  int start, end;
  
  public Range(int start, int end) {
    this.start = start; this.end = end;
  }
  
  public IEnumerator<int> GetEnumerator() => new RangeEnumerator(start, end);
  
  IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

Built-in collection classes

The standard C# library contains a number of built-in collection classes in the System.Collections.Generic namespace. Here is a picture of their type hierarchy:

IEnumerable<T>

ICollection<T>
- IDictionary<K, V>
  - Dictionary<K, V> (hash table)
  - SortedDictionary<K, V> (balanced binary tree)
  - SortedList<K, V> (sorted array)
- IList<T>
  - List<T> (array)
- ISet<T>
  - HashSet<T> (hash table)
  - SortedSet<T> (balanced binary tree)
Queue<T> (circular array)
Stack<T> (array)

For details about these interfaces and classes, see the C# library quick reference .

These classes are very useful, and we will be using them often in this course. The List<T> class is especially useful: it is a dynamic array, similar to the DynArray<T> class we wrote above.

Of course, a major goal of this course is not only to be able to use collection classes like these, but also to understand how they are implemented and the performance tradeoffs between them.

Inverting a dictionary

Here is a generic method that can invert a dictionary: given a dictionary that maps keys to values, it constructs an inverse dictionary that maps the values to the keys. (This assumes that all values are unique.)

  static IDictionary<V, K> invert<K, V>(IDictionary<K, V> d) {
    var e = new Dictionary<V, K>();
    foreach (K key in d.Keys)
      e[d[key]] = key;
    return e;
  }

Here's an alternate implementation that iterates directly over the key/value pairs in the source dictionary, which is perhaps slightly clearer:

  static IDictionary<V, K> invert<K, V>(IDictionary<K, V> d) {
    var e = new Dictionary<V, K>();
    foreach (KeyValuePair<K, V> pair in d)
      e[pair.Value] = pair.Key;
    return e;
  }

binary search tree review

Below is a review of binary search trees, which may be helpful for this week's homework assignment. For more details about binary search trees, see e.g. Introduction to Algorithms, ch. 12.

binary trees

A binary tree holds a set of values. A binary tree has zero or more nodes, each of which contains a single value. The tree with no nodes is called the empty tree. Any non-empty tree consists of a root node plus its left and right subtrees, which are also (possibly empty) binary trees.

Here is a picture of a binary tree:

tree
In this tree, a is the root node. Node b is the parent of nodes d and e. Node d is the left child of b, and node e is b's right child. Node e has a left child but no right child. Node c has a right child but no left child.

The subtree rooted at b is the left subtree of node a.

The nodes d, f, h and i are leaves: they have no children. Nodes a, b, c, e and g are internal nodes, which are nodes that are not leaves.

binary search trees

A binary search tree is a tree of ordered values such as integers or strings in which, for any node N with value v,

all values in N's left subtree are less than v
all values in N's right subtree are greater than v

Here is a binary search tree of integers:

tree

finding a value in a binary search tree

Finding a value in a binary search tree is straightforward. To find the value v, we begin at the root. Let r be the root node's value. If v = r, we are done. Otherwise, if v < r then we recursively search for v in the root's left subtree; if v > r then we search in the right subtree.

inserting into a binary search tree

Inserting a value into a binary search tree is also straightforward. Beginning at the root, we look for an insertion position, proceeding down the tree just as in the above algorithm for finding a node. When we reach an empty left or right child, we create a node there.

deleting from a binary search tree

Deleting a value from a binary search tree is a little trickier. It's not hard to find the node to delete: we just walk down the tree, just like when searching or inserting. Once we've found the node N we want to delete, there are several cases.

If N is a leaf (it has no children), we can just remove it from the tree.
If N has only a single child, we replace N with its child. For example, we can delete node 15 in the binary tree above by replacing it with 18.
If N has two children, then we must replace it by the next highest node in the tree. To do this, we start at N's right child and follow left child pointers for as long as we can. This wil take us to the smallest node in N's right subtree, which must be the next highest node in the tree after N. Call this node M. We must remove M from the right subtree, and fortunately this is easy: M has no left child, so we can remove it following either case (a) or (b) above. Now we update the node N, setting its value to the value that was in M.

As a concrete example, suppose that we want to delete the root node (with value 10) in the tree above. This node has two children. We start at its right child (20) and follow its left child pointer to 15. That’s as far as we can go in following left child pointers, since 15 has no left child. So now we remove 15 (following case b above), and then replace the value 10 with 15 at the root.