An abstract data type specifies a set of operations that an object may provide. In the next few lectures of this course, we will see various abstract data types including stacks, queues, sets, and dictionaries.
We will see that for most abstract data types there are different possible implementations using various data structures. We'll also see that the big-O running time of operations will often vary between implementations.
A stack is an abstract data type that provides two operations called 'push' and 'pop'. push(x) adds a value x to a stack, and pop() removes the value that was most recently pushed and returns it. This is like a stack of sheets of paper on a desk, where sheets can be added or removed at the top.
In other words, a stack is a last in first out (LIFO) data structure: the last element that was added is the first to be removed.
Often a stack will provide an additional operation is_empty() that returns true if it contains no elements.
Before we implement a stack, let's look at how one can be used:
# initially 's' is an empty stack s.push(4) s.push(8) s.push(1) while not s.is_empty() do print(pop(s))
This code will write
1 8 4
An easy way to implement a stack in Python is using a list, i.e. an array:
class ArrayStack: def __init__(self): self.a = [] def is_empty(self): return len(self.a) == 0 def push(self, x): self.a.append(x) def pop(self): assert not self.is_empty(), 'stack is empty' return self.a.pop()
The implementation is straightforward. The list contains all stack elements, with the top of the stack (i.e. the most recently pushed element) at the end of the list.
With this implementation, 'push' will run in O(1) on average, since that is the running time of append(). 'pop' will always run in O(1).
Let's now study a broadly useful data structure that we might alternatively use to implement a stack, namely a linked list. A linked list looks like this:
An element of a linked list is called a node.
A node contains one or more values, plus a pointer to the next node
in the list. The first node of a linked list is called its head.
The last node of a linked list is its tail. The tail always
points to None
(in Python, or its equivalent such as nil
in other
languages).
By the way, we will sometimes illustrate a linked list more compactly:
2 → 4 → 7 → None
The two pictures above denote the same structure; the first is simply more detailed.
Note that a Python list is not a linked list! A Python list is an array. :)
Here is a node type for a linked list in Python:
class Node: def __init__(self, val, next): self.val = val self.next = next
We can build the 3-element linked list pictured above as follows:
>>> r = Node(7, None) >>> q = Node(4, r) >>> p = Node(2, q)
Now p
refers
to the head of the list, and q and r refer to successive elements:
Through p, we can get to the values in the list:
>>> p.val 2 >>> p.next.val 4 >>> p.next.next.val 7
We can traverse a linked list using a loop that moves to the next list node on each iteration. Here's a function that takes a pointer to the head of a linked list, and returns its length:
def list_len(head): n = head count = 0 while n != None: n = n.next # move to the next node count += 1 return count
By modifying the function only slightly, we can write a function that computes the sum of all values in a linked list:
def list_sum(head): n = head total = 0 while n != None: total += n.val n = n.next # move to the next node return total
Let's now write a function that takes an integer n and constructs a linked list containing the values from 1 to n. One easy way to do that is to build the list in reverse order, just like when we built a list above with the values 2, 4, and 7. On each step, we will prepend a node to the existing list.
def list_1_to_n(): head = None for i in range(n, 0, -1): # n, n - 1, ... 1 p = Node(i, head) head = p return head
As this function runs, the local variable 'head' points to the beginning of the list that we've built so far. On each loop iteration, the function allocates a new node by calling the Node constructor function. As it does so, it makes the new node point to the existing list, by passing 'head' as the value for the 'next' attribute. After that, it modifies 'head' to point to the newly allocated node.
We can write many more functions that operate on linked lists, performing operations such as inserting nodes, deleting nodes, and so on. To get practice with this, we will solve various linked list exercises in our tutorials.
We
can implement a stack using a linked list. Our
class LinkedStack
will
have an attribute
'head' that points to
the head of the list:
The stack operations are fairly straightforward:
class LinkedStack: def __init__(self): self.head = None def is_empty(self): return self.head == None def push(self, x): n = Node(x, self.head) self.head = n def pop(self): assert self.head != None, 'stack is empty' x = self.head.val self.head = self.head.next return x
Notice that the push() method allocates a node n that points to the previous head, then records n as the new list head. In other words, it prepends a new node to the list. This is similar to the activity in the list_1_to_n() function above.
If
we implement a stack using an array (i.e. a Python list), the push
operation will take
O(1) on average but O(N) in the worst case, where N is the current
stack size. Our linked list-based implementation performs
differently: push
always runs in O(1) (assuming that
object allocations run in constant time).
Queues are another important abstract data type. A queue provides two operations called 'enqueue' and 'dequeue'. enqueue(x) adds an element to the tail of a queue, and dequeue() removes the element at the head and returns it. A queue is something like people waiting in a line: you must join at the back of the line, and the person at the front of the line is served next.
In other words, queues are a first in first out (FIFO) data structure: the first value added to a queue will be the first one to be removed.
A queue can be used like this:
# initially 'q' is an empty queue q.enqueue(4) q.enqueue(77) q.enqueue(12) print(q.dequeue()) # writes 4 print(q.dequeue()) # writes 77
A naive implementation of a queue in Python will store elements in an array, i.e. a Python list. It might look like this:
class ArrayQueue: def __init__(self): self.a = [] def is_empty(self): return len(self.a) == 0 def enqueue(self, x): self.a.append(x) def dequeue(self): return self.a.pop(0)
With this implementation, is_empty() will run in O(1), and enqueue() will run in O(1) on average. However, dequeue() will run in O(N), because all array elements must shift leftward by one position when the first element is deleted. So this is a poor choice of implementation if N might be large.
We can implement a queue more efficiently using a linked list. To do so, we will keep pointers to both the head (first node) and tail (last node) in the list, which will allow us to enqueue or dequeue elements in O(1):
Here is an implementation:
class LinkedQueue: def __init__(self): self.head = self.tail = None def isEmpty(self): return self.head == None def enqueue(self, x): n = Node(x, None) if self.tail == None: self.head = self.tail = n else: self.tail.next = n self.tail = n def dequeue(self): assert self.head != None, 'queue is empty' x = self.head.val self.head = self.head.next if self.head == None: self.tail = None return x
We have already seen stacks and queues, two abstract data types. We've seen that it's possible to implement these abstract types using various concrete data structures. For example, we can implement a stack or queue either using an array or a linked list.
Let's now introduce another abstract data type. A set provides the following operations:
A set cannot contain the same value twice: every value is either present in the set or it is not.
This type should be familiar, because it has a default implementation in Python that we have seen and used.
In this algorithms course, we'll study various ways to implement sets. We'd like to understand how we could build efficient sets in Python if they were not already provided in the standard library.
A binary tree consists of a set of nodes. Each node contains a single value and may have 0, 1, or 2 children.
Here is a picture of a binary tree of integers. (Note that this is not a binary search tree, which is a special kind of binary tree that we will discuss later.)
In this tree, node 10 is the root node. 14 is the parent of 12 and 6. 12 is the left child of 14, and 6 is the right child of 14. 14 is an ancestor of 22, and 22 is a descendant of 14.
A node may have 0, 1, or 2 children. In this tree, node 15 has a right child but no left child.
The subtree rooted at 14 is the left subtree of node 10. The subtree rooted at 1 is the right subtree of node 10.
The nodes 12, 5, 22, 4, and 3 are leaves: they have no children. Nodes 10, 14, 1, 6, and 15 are internal nodes, which are nodes that have at least one child.
The depth of a node is its distance from the root. The root has depth 0. In this tree, node 15 has depth 2 and node 4 has depth 3. The height of a tree is the greatest depth of any node. This tree has height 3.
The tree with no nodes is called the empty tree.
Note that a binary tree may be asymmetric: the right side might not look at all like the left. In fact a binary tree can have any structure at all, as long as each node has 0, 1, or 2 children.
A binary tree is complete iff every internal node has 2 children and all leaves have the same height. Here is a complete binary tree of height 3:
A complete binary tree of height h has 2h leaves, and has 20 + 21 + … + 2h-1 = 2h – 1 interior nodes. So it has a total of 2h + 2h – 1 = 2h + 1 – 1 nodes. In this tree there are 23 = 8 leaves and 23 – 1 = 7 interior nodes, a total of 24 – 1 = 15 nodes.
Conversely if a complete binary tree has N nodes, then N = 2h + 1 – 1, where h is the height of the tree. And so h = log2(N + 1) – 1 ≈ log2(N) – 1 = O(log N).
We can represent a binary tree in Python using an object for each node, similarly to how we represent linked lists. Here is a Node class we can use for binary trees:
class Node: def __init__(self, val, left, right): self.val = val self.left = left # left child, or None if absent self.right = right # right child, or None if absent
We will generally refer to a tree using a pointer
to its root node. We'll use None
to represent the empty tree, just as we used None
for the empty linked list. In all leaf nodes, left
and
right
will be None
.
Here is a small binary tree with just 3 nodes:
We can build this tree in Python as follows:
l = Node(7, None, None) r = Node(5, None, None) root = Node(4, l, r)
Here's a recursive function that counts the number of nodes in a binary tree:
def size(n): if n == None: # empty tree return 0 return 1 + size(n.left) + size(n.right)
Let's try it on the tree we just built:
>>> size(root) 3
Recursion is a natural fit for trees, since the pattern of recursive calls in a function like this one can mirror the tree structure.
Here's a recursive function that computes the height of a binary tree:
def height(node): if node == None: return -1 return max(height(node.left), height(node.right)) + 1
Let's try it:
>>> height(root) 1
The constant -1 in this function may seem surprising. Recall that the height of a binary tree is defined as the greatest depth of any node. If a binary tree has only a single node, then the depth of that node is 0, so the tree's height is 0. And so if a tree is empty, meaning that it has no nodes at all, then for consistency it must have a height of one less than that, or -1.
As another example, let's write a function that generates a complete binary tree of a given height h. Each node of the tree will contain a random value from 0 .. 99. Once more we will use recursion:
def complete(h): if h == -1: return None left = complete(h - 1) # generate left subtree right = complete(h - 1) # generate right subtree return Node(randrange(100), left, right)
Now we can easily generate a large tree:
>>> t = complete(10) >>> size(t) 2047 >>> height(t) 10
A binary search tree is a binary tree in which the values are ordered in a particular way that makes searching easy. We began to discuss binary search trees in our lecture this week, but did not have time to cover them completely. We'll return to this topic in next week's lecture.