There was no lecture or tutorial this week.
Our topics for this week are inheritance and the C# collection classes. They are discussed in our textbook Essential C# 7.0 (Chapter 7 "Inheritance"; Chapter 17 "Building Custom Collections": Primary Collection Classes) and are also covered in Programming C# 8.0 (Chapter 5 "Collections"; Chapter 6 "Inheritance").
Here are some additional notes.
C# (along with most other object-oriented languages) supports inheritance, a mechanism that allows a class to extend another class and to change its behavior. Interfaces may be inherited as well. We have already seen inheritance in Programming 1, where we briefly discussed inheritance between classes in Python. However we will emphasize inheritance more in Programming 2, for several reasons. Inheritance plays a major role in the C# collection class library and in other popular class libraries such as graphical interface toolkits. Furthermore, in Programming 2 we will start to write larger programs in which we may have our own uses for inheritance.
Let's begin looking at inheritance in C# using an example that you
may recall from our discussion of inheritance in Python. Suppose that
we have a simple C# class that implements a fixed-size stack of
values of type double
:
class Stack { double[] a; int count; public Stack(int maxSize) { a = new double[maxSize]; } public void push(double d) { a[count] = d; count += 1; } public double pop() { count -= 1; return a[count]; } public bool isEmpty => count == 0; }
We'd now like to write a class AvgStack
that is like
Stack
, but has an additional method average()
that can report the average value of the numbers that are currently
on the stack. We would like average()
to run in O(1). To
achieve that, AvgStack
will keep track of the current
total of all numbers on the stack, so that it can compute the average
instantly by dividing the total by the count.
We can write AvgStack
using inheritance. Here is an implementation:
class AvgStack : Stack { double total; public AvgStack(int maxSize) : base(maxSize) { } public override void push(double d) { base.push(d); total += d; } public override double pop() { double d = base.pop(); total -= d; return d; } public double average() => total / count; }
Let's look at this code in
some detail. The notation class AvgStack : Stack
means that the class AvgStack
inherits from
Stack
. (You will notice that this is
just like the syntax for indicating that a class implements an
interface.) In this situation, we say that Stack
is the superclass
(or parent
class or base
class),
and AvgStack
is
a subclass (or
derived class).
AvgStack
has
a public constructor that takes a single argument maxSize
,
just like the parent class. The notation : base(maxSize)
means that the AvgStack
constructor chains
to the base class
constructor – in other words, it calls the base class constructor,
passing it the same value of maxSize
that
it itself received. (You will recall that we previously saw similar
syntax for chaining to a different constructor in the same class.
When chaining to a constructor in the same class, we write :
this()
; when chaining to a constructor in the base class, we
write : base()
.)
The AvgStack
constructor does not need to initialize
the total
field
to 0.0, since 0.0 is the default value for fields of type
double
.
AvgStack
overrides the push
and
pop
methods from its parent class. That means that
AvgStack
provides its own implementation of these
methods. In the push()
method, AvgStack calls
base.push(d)
to call the same-named method in the base
class. It then runs total += d
to update the running
total. pop()
is similar.
Note that if you attempt to build
the Stack
and
AvgStack
classes
as written above, the code will
not compile! We
need to make two changes to the base class Stack
:
1. We must modify the
push()
and
pop()
methods
to be virtual
:
public virtual void push(double d) { a[count] = d; count += 1; } public virtual double pop() { count -= 1; return a[count]; }
In C#,
a subclass may only override base class methods if they are virtual
.
(This is a significant difference from Java, where any method may be
overridden.)
2.
We must modify the count
field
in the base class to be protected
:
protected int count; // in Stack class
That is because the
average()
method
in AvgStack
accesses count
:
public double average() => total / count; // in AvgStack class
In C#, the default access level for any member (e.g. field,
method) is private
. So if we use the simple declaration
"int count;
", then count
will be private
, which
means that it will accessible only within its own
class. A protected
member, by constract, is
accessible within its own class
and within any subclasses. Finally, as we have seen before, a
member marked as public
is accessible from anywhere.
Now that we have written
a parent class Stack
and subclass AvgStack
,
let's use them. We can create instances of each of these classes as
follows:
Stack s = new Stack(100); // create a Stack AvgStack t = new AvgStack(100); // create an AvgStack
How about the following – will this work?
AvgStack t = new Stack(100); // does not compile
We are creating a Stack
, and attempting to store it in a
variable of type AvgStack
. The code will not compile.
If t
has
type AvgStack
,
then it cannot hold this object because a
Stack
is
not an AvgStack
.
In general, an instance of a superclass (Stack
)
is not an instance of a subclass (AvgStack
)
because it doesn't have the extended capabilities that the subclass
provides.
On the other hand, consider this:
Stack s = new AvgStack(100); // works OK
This code will compile: a variable of type Stack
can
hold an AvgStack
, because an AvgStack
is a
kind of Stack
. In general, an instance of a subclass
counts as an instance of its superclass (or of any ancestor class).
Now, however: what happens if we invoke a method through a variable whose type is the superclass? In other words, how will the following code behave?
Stack s = new AvgStack(100); for (double d = 0.0 ; d < 10.0 ; d += 1.0) s.push(d); // which push() will this call?
Which implementation of
push()
will be invoked by the code
above – the implementation in Stack
,
or in the subclass AvgStack
?
The code will call the
implementation in AvgStack
.
Even though the object is held in a variable of type Stack
,
it is actually an AvgStack
.
A call to a virtual method is resolved based on an object's actual
class – the type of the variable holding the object is irrelevant.
The assigment Stack
s = new AvgStack(100)
works
because the type AvgStack
is
implicitly
convertible to
Stack
.
Because of this phenomenon, we may write a method that works
on Stack
objects and may pass it instances of the
AvgStack
subclass. For example:
static void empty(Stack s) { while (!s.isEmpty) Console.WriteLine(s.pop()); }
We may pass an AvgStack
to this method:
AvgStack s = new AvgStack(100); for (double d = 0.0 ; d < 10.0 ; d += 1.0) s.push(d); empty(s);
We have seen that C# will implicitly convert from a subclass to a superclass type. In the other direction, C# provides an explicit conversion that we can access via a type cast. For example:
Stack s = new AvgStack(100); AvgStack t = (AvgStack) s; // explicit conversion
We can use an explicit conversion to a subclass type when we are sure that a variable holds a object of that subclass. On the other hand, if the object does not actually belong to the subclass then the program will exit with an exception:
Stack s = new Stack(100); AvgStack t = (AvgStack) s; // compiles, but will throw an exception when run
Suppose that we'd like to write generic classes that implement several kinds of collections:
Stack<T>
, a LIFO
stack
Queue<T>
, a FIFO queue
PriorityQueue<T>
, a priority queue
Furthemore we would like to use these classes somewhat interchangeably, so we'd like them all to provide a common set of members:
a constructor that creates an empty collection
a constructor that takes an array of type T[]
and uses it to initialize the collection
a method void add(T val)
a method T remove()
a property count
that returns the current number
of items in the collection
a property isEmpty
that is true if the
collection is empty
The add()
and remove()
methods will work
differently in the different classes. For example, in a Queue<T>
the remove()
method will remove the element that was
added first, whereas in a PriorityQueue<T>
it will
remove the smallest value.
We could write all these classes independently. But since they
share some common functionality, it might be nicer to use inheritance
so that we can write the common code only once. However, none of
these classes can reasonably inherit from each other. (Perhaps a
priority queue sounds like a particular kind of queue, but the data
structure for implementing a priority queue (e.g. a binary heap) will
be completely different from the data structure for implementing a
FIFO queue (e.g. a linked list), and so PriorityQueue<T>
should really not inherit from Queue<T>
.)
Instead, a better idea is to write a top-level class Collection
that can serve as a superclass for all of these classes. Then we can
implement the common functionality in the Collection
class. Of course, it would make no sense to create an object of class
Collection
; the only purpose of Collection
is for other classes to derive from this. So we will mark Collection
as an abstract class. An
abstract class cannot be instantiated – that
is to say, you cannot create an instance of it. Furthermore, an
abstract class can have abstract methods,
which have no implementation: a concrete derived class must provide
an implementation of these methods.
Here
is the base class Collection
<T>
:
abstract class Collection<T> { protected int _count; protected Collection() { } protected Collection(T[] a) { foreach (T val in a) add(val); } public int count => _count; public bool isEmpty => _count == 0; protected abstract void _add(T val); protected abstract T _remove(); public void add(T a) { _add(a); _count += 1; } public T remove() { T val = _remove(); _count -= 1; return val; } }
The field _count
holds the number of elements in a
collection. Its name is preceded with an underscore to distinguish it
from the public read-only count
property. The class has
two constructors: one creates an empty Collection
and
another initializes a Collection
with elements from an
array. The constructors are protected, not public, since they will
only be called by subclasses.
The public methods add()
and remove()
call the abstract methods _add()
and _remove()
to perform the actual work of adding and removing elements from a
collection. Notice that these methods have no implementation; they
will be implemented by each subclass.
Here is a subclass Queue<T>
that implements a
FIFO queue using a linked list:
class Node<T> { public T val; public Node<T> next; public Node(T val) { this.val = val; } } class Queue<T> : Collection<T> { Node<T> head, tail; public Queue() { } public Queue(T[] a) : base(a) { } protected override void _add(T val) { if (head == null) head = tail = new Node<T>(val); else { tail.next = new Node<T>(val); tail = tail.next; } } protected override T _remove() { T val = head.val; head = head.next; return val; } }
Notice that we must write override
when implementing an
abstract method.
I will not provide implementations of the other subclasses.
Hopefully it is clear enough, however, that the functionality in the
top-level Collection<T>
class could reasonably be
shared by all subclasses.
You may notice some similarities between abstract classes and interfaces. In particular, they can each specify methods which need to be implemented by concrete classes. There are some differences, however: abstract classes may contain fields and constructors, and interfaces may not. An interface is something like an abstract class that only declares a set of abstract methods.
All C# class automatically inherit from a top-level class called
Object
which contains a modest number of methods that
are shared by all objects. One
of these methods is ToString
:
virtual string ToString ();
C# automatically calls this method when it needs to convert any object to a string, for example when writing an object to the console:
Window w = new Window(); Console.WriteLine(w); // will automatically call w.ToString()
The default implementation of ToString simply returns the name of the class, so the above code will write
Window
You can override ToString()
in your own classes if you'd
like to change their printed representation. (ToString()
is like the __repr
magic method in Python, which you may
remember.)
Object contains a couple of other methods (Equals
,
GetHashCode
) that we will consider when we discuss the
C# collection classes below.
Up to now we have only discussed inheritance between classes. C# also allows an interface to inherit from another interface. An inheriting interface has all the members of its parent interface, and can add additional members of its own. Here is a somewhat abstract example:
interface Shape { double perimeter(); double area(); } interface Polygon : Shape { int num_vertices { get; } } class Circle : Shape { … } class Rectangle : Polygon { … } class Triangle : Polygon { … }
In this example, a Shape
represents a shape in two
dimensions and a
Polygon
is a Shape
that has a finite number
of vertices. Because the Triangle
class implements
Polygon
, it must provide an implementation of
num_vertices
as well as of the perimeter()
and area()
methods.
C# provides
single class inheritance: a class may inherit from only one other class
multiple interface inheritance: an interface may inherit from multiple interfaces
This particular combination was popularized by Java. Many object-oriented languages (including Python) have a more general mechanism: they have multiple class inheritance, so that one class may have several immediate superclasses. That is more powerful, but creates various complexities that C# avoids through its simpler model.
LIke many object-oriented languages, C# contains a set of collection classes in its standard library. These classes implement common data structures such as stacks, queues and hash tables. They are generic and make use of inheritance.
Here is a picture of the class hierarchy of some of the major collection classes and interfaces:
All of these classes and
interfaces live in the namespace System.Collections.Generic
.
In the picture, an arrow from A to B means that B implements or inherits from A.
All entities above whose
names begin with the letter I (e..g ICollection
,
IDictionary
)
are interfaces.
This is a standard naming convention in the C# standard library.
The picture above includes the following classes:
List<T>
is
a dynamic array, similar to a list in Python. List<T> contains
an add()
method
for appending an element to the end of a List
.
It also contains an indexer so that you can get or set elements by
index.
Stack<T>
implements
a stack using a dynamic array. It includes methods Push()
and P
op()
.
Queue<T>
implements
a queue using a dynamic array, and includes methods Enqueue()
and Dequeue()
.
The classes HashSet<T>
and SortedSet<T>
implement sets, with methods Add()
, Contains()
and Remove()
. They differ in their underlying data
structure: HashSet<>
uses a hash table and
SortedSet<T>
uses a balanced binary tree.
Finally, the classes Dictionary<K, V>
and
SortedDictionary<K, V>
implement dictionaries,
with an indexer that can get/set values as well as a ContainsKey
method. They differ in their underlying data structure:
Dictionary<K, V>
uses a hash table and
SortedDictionary<K, V>
uses a balanced binary
tree.
You can read more details about these classes and interfaces in
our Quick
Library Reference. Note that you will need to follow the class
hierarchy in order to find all the members available in each of
these classes. For example, the methods available in List<T>
include not only the
ones listed under List<T>
in the library
reference, but also those listed under IList<T>
and ICollection<T>
, since List<T>
implements those interfaces.
You can read even more details about the collection classes (including various classes not depicted above) in the official .NET API documentation.
Notice that all collection classes ultimately inherit from an
interface IEnumerable<T>
. This interface contains
various methods that allow the foreach
statement to
iterate over a collection's values. We will not discuss these methods
at this point, but you should certainly be aware that you can use
foreach
to iterate over any collection.
You will want to become fluent in using these collection classes. With them, you finally have easy access to the same useful data structures that we often used in Python, i.e. lists, sets and dictionaries.
Dictionaries in particular are quite useful, so let's
look at the Dictionary<K, V>
class a bit more. As
you can see in the documentation, Dictionary<K, V>
implements the interface IDictionary<K, V>
, which
in turn implements the interface ICollection<KeyValuePair<K,
V>>
. This means that a dictionary is a collection of
key-value pairs. You can use foreach
to iterate over
a dictionary, and each element you get back will be a KeyValuePair<K,
V>
. Each KeyValuePair in turn has properties Key
and Value
. For example:
Dictionary<string, int> d = new Dictionary<string, int>(); d["yellow"] = 10; d["red"] = 15; d["blue"] = 20; foreach (KeyValuePair<string, int> p in d) WriteLine($"key = {p.Key}, value = {p.Value}");
Suppose that we have a class Vector
representing a
vector in an arbitrary number of dimensions:
class Vector { double[] a; public Vector(params double[] a) { this.a = a; } }
We could certainly add more methods to obtain the length of a vector, add two vectors, and so on, but those are not important for our discussion here.
Let's create a HashSet<Vector>
and add a couple
of vectors to it:
HashSet<Vector> s = new HashSet<Vector>(); s.Add(new Vector(1.0, 2.0, 3.0)); s.Add(new Vector(3.0, 4.0, 5.0));
OK – so far so good. Now let's test whether the vector (1.0 2.0 3.0) is in the set:
WriteLine(s.Contains(new Vector(1.0, 2.0, 3.0))); ==> False
Contains
has reported that the set does not contain that
vector! The problem is that the HashSet
does not
consider the first vector we added to be equal to the vector we
passed to Contains()
, despite the fact that they contain
the same numbers. So how can we tell HashSet
that they
should be equal?
As we briefly mentioned above, the top-level Object
class contains a method Equals
:
virtual bool Equals (object obj); // return true if this object equals (obj)
HashSet
calls this method to determine whether two
objects are equal. By default, Equals
just tests for
reference equality, i.e. it returns true only if two objects
are actually one and the same object. But we can override Equals
in our class to specify a different notion of equality:
public override bool Equals(object o) { if (!(o is Vector)) return false; Vector w = (Vector) o; if (a.Length != w.a.Length) return false; for (int i = 0 ; i < a.Length ; ++i) if (a[i] != w.a[i]) return false; return true; }
Our comparison method first returns false if o is not actually a
Vector
. (This is a sneak preview of the is
operator, which will we will study next week). Assuming that it is,
we use a type cast to convert to a Vector
, and then
compare the vectors element by element.
Now the HashSet
class should consider any two vectors
with the same components to be identical. However overriding Equals
is not sufficient. In fact, it we build the code above then the
compiler warns us that we have another task to complete:
prog.cs(4,7): warning CS0659: 'Vector' overrides Object.Equals(object o) but does not override Object.GetHashCode()
As the compiler has suggested, we should
also override the GetHashCode()
method, which also
belongs to the top-level Object
class, and has this
signature:
virtual int GetHashCode ();
Why must we override
GetHashCode()
whenever
we override Equals()
?
The default implementation of GetHashCode()
computes hash codes that
might be different for any two different objects. But if two objects
are considered equal by Equals()
,
such as two different Vector
objects
that have identical components, then they must have the same hash
code so that a hash table implementation (such as HashSet
)
will look for them in the same hash bucket. This should make sense to
you based on our study of hash tables in Intro to Algorithms. (And in
fact you may remember that we encountered this same phenomenon in
Python, where we needed to implement a magic method __hash() if we
implemented a custom equality operator for a class.)
So let's expand our
Equals
class
with a GetHashCode()
method.
For an aggregate object such as a Vector
,
the easiest approach is to combine the hash codes of the underlying
values in some way. For
example:
public override int GetHashCode() { long hash = 0; foreach (double d in a) hash = hash * 1000003 + d.GetHashCode(); return (int) hash; }
(You might recognize the prime constant 1,000,003 above from our discussion of hash functions in Introduction to Algorithms last semester.)
Now there are no compiler
warnings, and looking up a Vector
in a HashSet
has the expected result:
HashSet<Vector> s = new HashSet<Vector>(); s.Add(new Vector(1.0, 2.0, 3.0)); s.Add(new Vector(3.0, 4.0, 5.0)); WriteLine(s.Contains(new Vector(1.0, 2.0, 3.0))); ==> True