Week 11: Heaps and Trees as Arrays
Today we’ll learn about heaps. A heap structure is a tree that has a heap invariant. We’ll be focusing on binary min-heaps, which are a nice way to implement priority queues, a common way of managing tasks of varying priorities: elements are added in any order, but the “most important” elemnt is taken out each time. (For min-heaps, “most important” means “smallest”.)
It may help to think of priority queues as working on pairs of values and priorities, e.g., you might queue up tasks for the day, with lower priority meaning more important:
- video games, 7
- class, priority 3
- eat, priority 2
- laundry, priority 6
If enqueued in a priority queue in any order, these would come out in the order “eat”, “class”, “video games”, “laundry”.
Trees in text
But first, let’s get a textual notation for our trees.
A binary tree t
is either:
empty
, the empty tree, ornode(v, l, r)
, a node with valuev
, left childl
, and right childr
.
Defining heaps
A binary tree t
is a min-heap when:
t
is is complete (i.e., every level is full except for possibly- for every
node(v, l, r)
int
, the valuev
is less than or equal to the values inl
andr
Contrast the min-heap property with the search property. The search property amounts to “is sorted when traversed in-order”. Here, the min-heap property amounts to “the top most value is king of the hill, i.e., the minimal value in the whole tree”.
A few things:
- Unlike a search tree, min-heaps can contain duplicates. The root of a min-heap is minimal, but not the minimum.
- Search trees have some notion of order, but min-heaps have no relation between the left and right subtrees.
- Search trees can be wildly imbalanced, but a min-heap is always balanced (because complete trees are always balanced).
You can take our notion of min-heap and convert
Exercise 1
Draw a min-heap with 7 elements. Now draw one with 8 elements. How many levels does each have?
Operations on min-heaps
We’re using min-heaps to implement priority queues, so we really only need two operations:
enqueue
a task with some prioritydequeue
the next task (with minimal priority)
The enqueue
operation corresponds to adding a new element to the
min-heap. The tricky thing here is that we need to maintain the
min-heap invariant, which means keeping our tree complete, i.e., all
levels full except the last, which fills from the left.
The dequeue
operation corresponds to removing a minimal element from
the min-heap. This should be easy: the root of the tree is minimal!
But just like the remove
operation on BSTs, we have to fix up the
tree: if we have the tree node(v, l, r)
and we remove v
, who is
the new root?
It turns out both of these operations can be implemented simply using a notion of “fake it til you make it”. We’ll deliberately misplace elements in the heap (to maintain the completeness invariant) and then fix the tree up. To restore the ‘min’ part of the min-heap.
The enqueue
operation a/k/a insert
Concretely, for enqueue
, we’ll put the element in the appropriate
position: the leftmost open spot on the last level (or as the leftmost
element in a new level). Then we’ll start at that node and work our
way up, asking: hey, is the node here less than its parent? If so,
swap their values! We keep swapping until we’ve found a good spot.
Here’s an example. Suppose we have the follow min-heap (verify that it’s a min-heap!):
5
/ \
10 6
/
11
Suppose we want to enqueue
4 into this tree. The next available spot
is the right child of the 10, so we’ll get this tree:
5
/ \
10 6
/ \
11 4
Uh oh… that’s not a min-heap! So we have to fix it up. We’ll start at 4 and work our way up. We compare 4 with its parent, 10, and see that 4 is less… so we swap it up, finding:
5
/ \
4 6
/ \
11 10
We still don’t have a min-heap. We compare 4 with its parent again, finding that 4 < 5. So we swap once more:
4
/ \
5 6
/ \
11 10
Whew—fixed!
Exercise 2
Draw a min-heap with 7 nodes (you can use your earlier one). Find elements that take 0, 1, 2, and 3 swaps respectively. Could an element take 4 swaps?
The enqueue
algorithm
Here’s some pseudocode for how to add something an element e
to a
min-heap:
- Put the element in the last valid position, i.e., the rightmost unused child node in the last level, or as the leftmost child in a new level. (Observe that the tree is still complete.)
- While the new node is strictly less than its parent, swap their elements and continue traversing upward until either (a) the parent is less than or equal to the new node, or (b) you’ve reached the root.
The algorithm has two phases: (1) insert the element, and (2) fix your mistakes. It’s O(log(n)), where n is the number of elements… because we swap from a leaf on a complete tree possibly up to the root—and we can only do that log(n) times before reaching the root, since a complete tree with n elements is log(n) high.
The dequeue
operation a/k/a removeMin
Similarly, to dequeue
, we’ll pop off the root—that’s the minimal
element we want to dequeue—and we’ll replace it with the very last
element in the heap, i.e, the rightmost non-empty node in the last
level. Now, that’s probably a terrible choice for our root. So we’ll
swap it down, pushing it towards whichever of its children are
minimal.
Here’s another example. Suppose we start with the same tree:
5
/ \
10 6
/
11
Finding the minimum here is even easier than in search trees: it’s
just the root! We’ll return 5 in the end, but first… we have to
restore the invariant. We currently have this tree with a hole in it,
marked XXX
:
XXX
/ \
10 6
/
11
We’ll pick the ‘last’ element in the heap—that is, the rightmost element on the last level. Here, that’s 11. So now we have this tree:
11
/ \
10 6
Unfortunately… it’s not a min-heap. We need to do some swapping. We look at our new root and ask: which child is the minimum? Here, it’s 6, because 6 < 10. If the minimum child is less than our node, we’ll swap. We continue this process until the minimum child is no greater than our node (or we have no children).
Since 6 < 11, we’ll swap, yielding:
6
/ \
10 11
A min-heap… success!
Exercise 3
Draw a min-heap that, when you dequeue
, will swap 3 times. Before
you start… how many levels do you need to start with to make 3 swaps
possible?
The dequeue
algorithm
Here’s some pseudocode for how to get a minimal element out of a min-heap:
-
Save the root element to return later.
-
Take the last element in the heap—the rightmost element in the lowest level—and make it the new root. (Observe that the tree is still complete).
-
Starting at the root:
a. Compute minimum of the two children of the current node.
b. If the minimum is greater than the value at the current node, stop and return the original root saved earlier.
c. Swap the current node and the minimum child’s values. Continue with (a), with the swapped child selecting the next ‘current’ node (i.e., if you the left child was the minimum, continue down the left subtree).
The algorithm has three phases: (1) remove the root and save it to return, (2) move the last element to the root, and (3) fix your mistakes. It’s O(log(n)), where n is the number of elements… because we swap from the root on a complete tree possibly down to a leaf—and we can only do that log(n) times before reaching a leaf, since a complete tree with n elements is log(n) high.
Implementing trees using arrays
As you’ve been thinking about these algorithms, you may wonder… how
am I supposed to find the last-in-heap value? Isn’t that expensive?
It’ll be hard to keep track of! It doesn’t seem fun to implement
min-heaps using Node
s…
So, we won’t. Complete trees correspond really nicely to an array
representation, which makes it easy to find the “last-in-heap” value
for dequeue
and the next available spot for enqueue
.
Here’s the idea: we’ll flatten things in level order, left to right. So consider the following tree:
a
/ \
b c
/ \ / \
d e f g
We can represent it as the following array:
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
value | a | b | c | d | e | f | g |
Here’s another example:
a
/ \
b z
/ \
m g
index | 0 | 1 | 2 | 3 | 4 |
value | a | b | z | m | g |
Observe a few things:
- The root is at index 0.
- The left and right children of a node come after it.
- Nodes come in level order.
This representation has some nice properties:
- The last element in the heap (for
dequeue
) is just the last element in the array. - The next blank spot (for
enqueue
) is just one past the last element of the array.
So we can implement a min-heap’s tree using… an ArrayList
!
Finding parents and children
Concretely, we can use these rules to write down a complete tree as an array:
-
The root node has index 0.
-
If a node has index i, then:
- its left child has index 2i + 1, and
- its right child has index 2i + 2.
So the left child of index 0 is going to be 2*0+1=1, i.e., index 1. The right child of index 0 is going to be 2*0+2=2.
If a node has index i, its parent has index (i - 1)/2. Note that we mean integer division, which truncates. Let’s check our work:
- The node at index 1 has the parent (1 - 1)/2 = 0/2 = 0. Correct!
- The node at index 2 has the parent (2 - 1)/2 = 1/2 = 0. Correct again!
Finally, note that the root node is its own parent: (0 - 1)/2 = -1/2 = 0. Be careful to avoid infinite loops!
Exercise 4
Encode your 7-value min-heap into an array.
Tying it all together
Implement a min-heap with the following signature:
public class Heap<E extends Comparable<? super E>> {
private ArrayList<E>
public Heap(); // starts empty
public int size();
public void insert(E);
public E removeMin(); // should throw an exception if empty
}
You’ll find it helpful to define (private
) methods for swapping
indices, as well as computing parent/child indices.