All C# class automatically inherit from a
top-level class called Object
which contains a modest
number of methods that are shared by all objects. One
of these methods is ToString
:
virtual string ToString ();
C# automatically calls this method when it needs to convert any object to a string, for example when writing an object to the console:
Window w = new Window(); Console.WriteLine(w); // will automatically call w.ToString()
The default implementation of ToString simply returns the name of the class, so the above code will write
Window
You can override ToString()
in your own classes if you'd
like to change their printed representation. (ToString()
is like the __repr
magic method in Python, which you may
remember.)
Object contains a couple of other methods (Equals
,
GetHashCode
) that we will consider when we discuss the
C# collection classes below.
In Introduction to
Algorithms we learned about various abstract
data types,
such as stacks and queues. For
each abstract data type, we described the methods that it provides.
For example, a stack has methods push()
,
pop()
and
isEmpty()
.
We saw that we can implement each abstract data type in various ways.
For example, we can implement a stack either using a linked list, or
an array. Different implementations of an abstract data type will
have equivalent functionality, though they might have different
performance characteristics.
C# interfaces formalize this concept. An interface defines a set of methods or other members in an abstract data type. For example, here is an interface for a stack of integers:
interface Stack { bool isEmpty(); void push(int i); int pop(); }
There is no implementation yet
- just a set of methods to be implemented. We can write a concrete
class LinkedStack
that implements a stack using a linked list. It might begin like
this:
class LinkedStack : Stack { Node head; public bool isEmpty() { return (head == null); } public void push(int i) { head = new Node(i, head); } public int pop() { … } }
Above, "class
LinkedStack : Stack
" means
that we are defining a class LinkedStack
that implements the Stack
interface. The compiler will check that our class provides an
implementation of every method that was declared in the interface. If
any implementations are missing, we will get a compiler error.
We might then write a second
class ArrayStack
that implements a stack using a fixed-size array. It could begin like
this:
class
ArrayStack : Stack {
int
[] a;
int
num;
// number of elements currently on the stack
public
ArrayStack(
int
maxSize) {
a =
new
int
[maxSize];
}
public
bool
isEmpty() {
return
num ==
0
; }
…
}
We now have two different concrete implementations of a Stack
.
We might create two Stack
objects as follows:
Stack s = new LinkedStack(); Stack t = new ArrayStack(100);
Notice that a variable can have an interface type, such as the variables s and t above. A variable with an interface type can hold any object that implements that interface.
Method parameters can also have an interface type.
For example, here is a method that will empty a Stack
,
printing all values as it does so:
static void empty(Stack s) { while (!s.isEmpty()) { int i = s.pop(); WriteLine(i); } }
We may pass either a LinkedStack
or an ArrayStack
to this method. For example:
ArrayStack as = new ArrayStack(100); for (int i = 0 ; i < 5 ; ++i) as.push(i); empty(as);
Notice that an interface may not contain a constructor. Any class that implements an interface may have its own constructor(s), but those are not part of the interface.
An interface may, however, contain properties and
indexers. For example, let's modify the Stack
interface
so that isEmpty
is a read-only property, not a method:
interface Stack { bool isEmpty { get; } void push(int i); int pop(); }
All members in an interface are implicitly public, so you should not
write the keyword public
in an interface declaration.
However, when you write a class that implements
an interface, all
implementations of the interface's members must be declared public
(as you can see in the examples above).
Note that a class may implement more than one
interface. For example, let's write an interface Countable
for any type that has a current element count:
interface Countable { int count { get; } }
Now we can modify our ArrayStack
class so that it
implements both Stack
and Countable
:
class
ArrayStack : Stack, Countable {
int
[] a;
int
num;
// number of elements currently on the stack
public
ArrayStack(
int
maxSize) {
a =
new
int
[maxSize];
}
public
bool
isEmpty
=> (
num ==
0
);
// implementation of property from Stack
public
int
count => num;
// implementation of property from Countable
…
}
Up to now we have only discussed inheritance between classes. C# also allows an interface to inherit from another interface. An inheriting interface has all the members of its parent interface, and can add additional members of its own. Here is a somewhat abstract example:
interface Shape { double perimeter(); double area(); } interface Polygon : Shape { int num_vertices { get; } } class Circle : Shape { … } class Rectangle : Polygon { … } class Triangle : Polygon { … }
In this example, a Shape
represents a shape in two
dimensions and a
Polygon
is a Shape
that has a finite number
of vertices. Because the Triangle
class implements
Polygon
, it must provide an implementation of
num_vertices
as well as of the perimeter()
and area()
methods.
C# provides
single class inheritance: a class may inherit from only one other class
multiple interface inheritance: an interface may inherit from multiple interfaces
This particular combination was popularized by Java. Many object-oriented languages (including Python) have a more general mechanism: they have multiple class inheritance, so that one class may have several immediate superclasses. That is more powerful, but creates various complexities that C# avoids through its simpler model.
We have now learned enough C# that we can implement data structures such as stacks, queues, and binary trees. However, in the subset of C# that we know so far, each of our implementations will be tied to some specific type. For example, we can write a stack class that holds integers:
class
Int
Stack {
int
[] a;
int
num;
// number of elements currently on the stack
public
Int
Stack(
int
maxSize) {
a =
new
int
[maxSize];
}
}
Similarly, we can write a stack that holds strings:
class
String
Stack {
strin
g
[] a;
int
num;
// number of elements currently on the stack
public
String
Stack(
int
maxSize) {
a =
new
string
[maxSize];
}
}
But this is a significant limitation - we certainly don't want to have to rewrite our data structure classes once for every possible element type!
In Python last semester, we had no such issue because Python is dynamically typed. When we wrote a stack class in Python, we could add anything at all to it:
# Python code s = new Stack() s.push(14) s.push('hello') s.push([1, 2, 3])
C# is different: it is a statically typed language, meaning that every variable has a type. And so C# provides a different mechanism called generic classes. Generic classes let us write a container class only once and reuse its implementation for every possible element type.
A generic
class has one or more type
parameters. For
example, we can implement a class called FixedStack<T>
.
This means "a fixed-size stack of objects of type T". Our
class might look like this:
class
FixedStack<T> {
T
[] a;
int
num;
// number of elements currently on the stack
public
FixedStack(
int
maxSize) {
a =
new
T
[maxSize];
}
public
void
push(T val) {
a[num] = val;
num +=
1
;
}
public
T pop() {
num -=
1
;
T ret = a[num];
return
ret;
}
}
In the class above, T is a type parameter. When
the caller creates a FixedStack
,
they will specify the type T:
FixedStack<int> s = new FixedStack<int>(100); // create a stack of ints s.push(15); s.push(25); FixedStack<string> t = new FixedStack<string>(100); // create a stack of strings t.push("hello"); t.push("goodbye");
Notice that inside the class
FixedStack<T>
,
the type T can be used like any other type: we can create an array of
objects of type T, or use T as a method parameter type, a method
return type or a variable type.
Observer
that a FixedStack
is
not quite like our stack implementation in Python, where a single
stack could hold objects of various types. In C#, by contrast, each
FixedStack
is
tied to some particular element type. That is a good thing: if we
create a FixedStack<int>
,
intending to hold ints, and then attempt to push a string to it by
mistake, we will get a compile-time error.
By the way, the word T above is arbitrary: a type parameter can have any name you like. It is traditional, however, in C# (and other languages with generics) to use "T" for classes that take a single type parameter.
A
single class may take multiple type parameters. For example, suppose
that we want to write a Dictionary
class that maps keys
to values using a hash table. It might look
like this:
class Dictionary<K, V> { … public void add(K key, V val) { … } public bool contains(K key) { … } }
Here, K and V are two type parameters. When the caller creates a
Dictionary
, they will specify types for K and V:
Dictionary<int, string> d = new Dictionary<int, string>(); // maps int → string d.add(10, "sky"); d.add(20, "purple"); Dictionary<string, double> e = new Dictionary<string, double>(); // maps string → double e.add("purple", 55.2);
Let's consider how we could implement such a class. As we learned in Intro to Algorithms, a hash table contains an array of hash chains, each of which consists of a linked list of nodes. So we will need a Node class that is itself generic:
class Node<K, V> { public K key; public V val; public Node<K, V> next; public Node(K key, V val) { this.key = key; this.val = val; } }
Notice that inside a
generic class such as Node
, we must specify the type
parameters <K, V> when we refer to the class
itself (e.g in the field declaration Node<K, V>
next
), but the constructor declaration does not include the
type parameters.
Now, inside the Dictionary
class we
could have a field that holds an array of hash chains of appropriate
type:
class Dictionary<K, V> { Node<K, V> a[]; }
We
discussed generic classes above. An interface may also be generic.
For example, let's revisit the Stack
interface we wrote before,
which described a stack of ints:
interface Stack { bool isEmpty { get; } void push(int i); int pop(); }
We will now modify it to be a generic interface Stack<T>
,
which is a stack of elements of type T:
interface Stack<T> { bool isEmpty { get; } void push(T i); T pop(); }
Above, we wrote a class FixedStack<T>
,
implementing a fixed-size stack of elements of type T. We can now
modify that class so that it implements the Stack<T>
interface:
class
FixedStack<T> : Stack<T> {
T
[] a;
int
num;
// number of elements currently on the stack
…
}
Once again, a variable may have an interface type:
Stack<double> s = new FixedStack<double>(100); s.push(2.0); s.push(4.0);
Many types are naturally ordered: for any two values, we can say whether one is greater or less than the other. For example, we can compare integers or doubles in the obvious fashion, or can compare strings alphabetically. For other types, an ordering may not be so obvious: for example, does it make sense to say that one stack is greater or less than another? Perhaps not.
The C# standard library contains a generic
interface IComparable<T>
that is implemented by
all ordered built-in types. IComparable<T>
means
"comparable with objects of type T". For example, the
built-in type int
implements IComparable<int>
,
since integers are comparable with integers. Similarly, string
implements IComparable<string>
.
IComparable<T>
is
defined like this:
interface IComparable<T> { int CompareTo (T other); }
The CompareTo
method returns
a negative number
if this object is less than other
0 if this object equals other
a positive number
if this object is greater
than other
For example:
WriteLine(4.CompareTo(7)); // writes -1, since 4 < 7
IComparable<T>
is useful because it enables us to
write generic classes that compare their elements, as we will see in
the next section.
The generic class FixedStack<T>
that we wrote above treats
its elements as opaque
values: it stores
them, but does not know anything about their behavior.
Now
suppose that we want to write a class MaxStack<T>
which holds a stack of
values of type T, and also provides a method T max()
that returns the largest
value currently on the stack. You might think that we could implement
the class like this:
class MaxStack<T> { T[] a; int num; … T max() { T v = a[0]; // Scan the stack, looking for the largest element. for (int i = 1 ; i < num ; ++i) if (a[i] > v) // we found a larger element v = a[i]; // remember it return v; } }
This code will not compile. The problem is that we can't use the > operator to compare two values of type T, because T might be some type that is not ordered – for example, T might be an array type, and arrays cannot be compared in C#.
So we will need to modify
MaxStack
so that it's only possible to create a MaxStack<T>
if T is an ordered type. C# provides constraints
for this purpose. As we saw in the previous section, any ordered type
implements the IComparable<T>
interface. So we can add a constraint to MaxStack<T>
that says that T must implement IComparable<T>
:
class MaxStack<T> where T : IComparable<T> { … }
And now in the class we can implement the max() method like this:
T max() { T v = a[0]; // Scan the stack, looking for the largest element. for (int i = 1 ; i < num ; ++i) if (a[i].CompareTo(v) > 0) // we found a larger element v = a[i]; // remember it return v; }
Notice that even if T is
constrained to implement IComparable<T>
,
we still cannot use the < operator to compare two elements of type
T – instead, we must call CompareTo()
.
In fact, in C# you may not ever use comparison operators to compare
values of generic types, even the == operator! However, if T is known
to implement IComparable<T>
then you can test whether two values of type T are identical by
calling CompareTo()
and checking for the return value 0. It would be nice if C# allowed
comparison operators in this context (and some other languages do,
e.g. Kotlin), but it does not.
So far we have used generics only to write entire classes. C# also allows us to write generic methods that take one or more type parameters, even outside a generic class. Occasionally this is useful. For example, suppose that we want to write a static method that swaps two values of any type T. We can write this as as a generic method:
public static void swap<T>(ref T a, ref T b) { T t = a; a = b; b = t; }
Suppose that we want to write a generic array
class ExtArray<T>
that holds elements of type T
and has a special behavior: if the caller attempts to read
an element that is out of bounds, then instead of an exception they
will receive a default value in return. We would like this default
value to be the "natural" default for the element type; for
example, an ExtArray<int>
should return 0 if an
access is out of bounds, but an ExtArray<bool>
should return false.
C# provides an operator default
that
provides this. For any type T, default(T) returns C#'s default value
for that type:
WriteLine(default(int)); // writes 0 WriteLine(default(bool)); // writes false
Now we can implement ExtArray<T>
as follows:
class ExtArray<T> { T[] a; int count; … public T this[int index] { // return default value if out of bounds get => index < count ? a[index] : default(T); set => … } }
LIke many object-oriented languages, C# contains a set of collection classes in its standard library. These classes implement common data structures such as stacks, queues and hash tables. They are generic and make use of inheritance.
Here is a picture of the class hierarchy of some of the major collection classes and interfaces:
All of
these classes and interfaces live in the namespace
System.Collections.Generic
.
In the picture, an arrow from A to B means that B implements or inherits from A.
All
entities above whose names begin with the letter I (e..g ICollection
,
IDictionary
)
are interfaces.
This is a standard naming convention in the C# standard library.
The picture above includes the following classes:
List<T>
is
a dynamic array, similar to a list in Python. List<T> contains
an add()
method
for appending an element to the end of a List
.
It also contains an indexer so that you can get or set elements by
index.
Stack<T>
implements
a stack using a dynamic array. It includes methods Push()
and P
op()
.
Queue<T>
implements
a queue using a dynamic array, and includes methods Enqueue()
and Dequeue()
.
The classes HashSet<T>
and
SortedSet<T>
implement sets, with methods Add()
,
Contains()
and Remove()
. They differ in
their underlying data structure: HashSet<>
uses a
hash table and SortedSet<T>
uses a balanced
binary tree.
Finally, the classes Dictionary<K,
V>
and SortedDictionary<K, V>
implement
dictionaries, with an indexer that can get/set values as well as a
ContainsKey
method. They differ in their underlying
data structure: Dictionary<K, V>
uses a hash
table and SortedDictionary<K, V>
uses a balanced
binary tree.
You can read more details about these classes and
interfaces in our Quick
Library Reference. Note that you will need to follow the class
hierarchy in order to find all the members available in each of
these classes. For example, the methods available in List<T>
include not only the
ones listed under List<T>
in the library
reference, but also those listed under IList<T>
and ICollection<T>
, since List<T>
implements those interfaces.
You can read even more details about the collection classes (including various classes not depicted above) in the official .NET API documentation.
Notice that all collection classes ultimately
inherit from an interface IEnumerable<T>
. This
interface contains various methods that allow the foreach
statement to iterate over a collection's values. We will not discuss
these methods at this point, but you should certainly be aware that
you can use foreach
to iterate over any collection.
You will want to become fluent in using these collection classes. With them, you finally have easy access to the same useful data structures that we often used in Python, i.e. lists, sets and dictionaries.
Dictionaries in particular are quite useful, so
let's look at the
Dictionary<K, V>
class a bit more. As you can see
in the documentation, Dictionary<K, V>
implements
the interface IDictionary<K, V>
, which in turn
implements the interface ICollection<KeyValuePair<K, V>>
.
This means that a dictionary is a collection of key-value pairs.
You can use foreach
to iterate over a dictionary, and
each element you get back will be a KeyValuePair<K, V>
.
Each KeyValuePair in turn has properties Key
and Value
.
For example:
Dictionary<string, int> d = new Dictionary<string, int>(); d["yellow"] = 10; d["red"] = 15; d["blue"] = 20; foreach (KeyValuePair<string, int> p in d) WriteLine($"key = {p.Key}, value = {p.Value}");
Suppose that we have a class Vector
representing a vector in an arbitrary number of dimensions:
class Vector { double[] a; public Vector(params double[] a) { this.a = a; } }
We could certainly add more methods to obtain the length of a vector, add two vectors, and so on, but those are not important for our discussion here.
Let's create a HashSet<Vector>
and add a couple of vectors to it:
HashSet<Vector> s = new HashSet<Vector>(); s.Add(new Vector(1.0, 2.0, 3.0)); s.Add(new Vector(3.0, 4.0, 5.0));
OK – so far so good. Now let's test whether the vector (1.0 2.0 3.0) is in the set:
WriteLine(s.Contains(new Vector(1.0, 2.0, 3.0))); ==> False
Contains
has reported that the set does not contain that
vector! The problem is that the HashSet
does not
consider the first vector we added to be equal to the vector we
passed to Contains()
, despite the fact that they contain
the same numbers. So how can we tell HashSet
that they
should be equal?
As we briefly mentioned above, the top-level
Object
class contains a method Equals
:
virtual bool Equals (object obj); // return true if this object equals (obj)
HashSet
calls this method to determine whether two
objects are equal. By default, Equals
just tests for
reference equality, i.e. it returns true only if two objects
are actually one and the same object. But we can override Equals
in our class to specify a different notion of equality:
public override bool Equals(object o) { if (!(o is Vector)) return false; Vector w = (Vector) o; if (a.Length != w.a.Length) return false; for (int i = 0 ; i < a.Length ; ++i) if (a[i] != w.a[i]) return false; return true; }
Our comparison method first returns false if o is not actually a
Vector
. (This is a sneak preview of the is
operator, which will we will study next week). Assuming that it is,
we use a type cast to convert to a Vector
, and then
compare the vectors element by element.
Now the HashSet
class should consider
any two vectors with the same components to be identical. However
overriding Equals
is not sufficient. In fact, it we
build the code above then the compiler warns us that we have another
task to complete:
prog.cs(4,7): warning CS0659: 'Vector' overrides Object.Equals(object o) but does not override Object.GetHashCode()
As the compiler has suggested, we should
also override the GetHashCode()
method, which also
belongs to the top-level Object
class, and has this
signature:
virtual int GetHashCode ();
Why must we override
GetHashCode()
whenever
we override Equals()
?
The default implementation of GetHashCode()
computes hash codes that
might be different for any two different objects. But if two objects
are considered equal by Equals()
,
such as two different Vector
objects
that have identical components, then they must have the same hash
code so that a hash table implementation (such as HashSet
)
will look for them in the same hash bucket. This should make sense to
you based on our study of hash tables in Intro to Algorithms. (And in
fact you may remember that we encountered this same phenomenon in
Python, where we needed to implement a magic method __hash() if we
implemented a custom equality operator for a class.)
So let's
expand our Equals
class
with a GetHashCode()
method.
For an aggregate object such as a Vector
,
the easiest approach is to combine the hash codes of the underlying
values in some way. For example:
public override int GetHashCode() { long hash = 0; foreach (double d in a) hash = hash * 1000003 + d.GetHashCode(); return (int) hash; }
(You might recognize the prime constant 1,000,003 above from our discussion of hash functions in Introduction to Algorithms last semester.)
Now there
are no compiler warnings, and looking up a Vector
in a HashSet
has the expected result:
HashSet<Vector> s = new HashSet<Vector>(); s.Add(new Vector(1.0, 2.0, 3.0)); s.Add(new Vector(3.0, 4.0, 5.0)); WriteLine(s.Contains(new Vector(1.0, 2.0, 3.0))); ==> True