Lecture 14: Binary Search Trees

C SC 205 Lecture 14: Binary Search Trees
major resources: Data Structures and the Java Collections Framework Second Edition, William Collins, McGraw-Hill, 2005
Introduction to Programming and OO Design using Java, Niño and Hosch, Wiley & Sons, 2002

[ previous | schedule | next ]

Wednesday October 31 (week 8)

The Ordering Property and Binary Search Trees

A binary tree has the ordering property if, for every parent node in the tree, its left child has a smaller value and its right child has a larger value.
The above assumes there are no duplicate values in the tree. Multiple elements with equal value can be easily accommodated by either defining left child to be "smaller or equal" OR defining right child to be "larger or equal"
A binary tree having the ordering property is known as a binary search tree
Here's a recursive definition: A tree t rooted at root is a binary search tree if either t is empty OR (root's left child has a smaller value AND root's right child has a larger value AND both of root's children are the roots of binary search trees)
This requires that a natural ordering is defined for the element types
In Java, natural orders are defined by implementing the Comparable<T> interface
Binary search trees can be searched in logarithmic time, assuming they are balanced!
A balanced tree has roughly the same number of left children and right children.
Interesting fact: An in-order traversal of a binary search tree will visit the nodes by increasing order of element value!

The `Comparable` interface

Defined in java.lang so you can refer to it without importing
Has only one method: int compareTo(T obj)
Use it to compare the invoking object to the argument object. To clarify, in the expression a.compareTo(b) , a is the invoking object and b is the argument object.
The return value is determined as follows:

Returns an int value less than 0 if the invoking object is less than the argument object
Returns 0 if the invoking object is equal to the argument object
Returns an int value greater than 0 if the invoking object is greater than the argument object

The method programmer determines what it means to be less than, equal to, or greater than. This imposes a natural ordering over objects of the implementing class
Guidelines state that it should return 0 if and only if equals() of the same objects returns true.
NOTE: If you are working an element class that does not implement Comparable, you can alternatively use a Comparator object to do the job.

java.util.Comparator<T> is the interface to implement.
int compare(<T> obj1, <T> obj2) is the method to define.

Returns value < 0 if obj1 is less than obj2
Returns 0 if obj1 is equal to obj2 (obj1.equals(obj2) recommended but not required)
Returns value > 0 if obj1 is greater than obj2

See the Collections class static sort() method for an example. The overloaded sort() has an extra parameter for a Comparator, which will be used regardless of whether the list elements are Comparable or not.

NOTE: The Student class as defined in my textbook (first printing of second edition) pages 392-393 is pre-Java 5.0; it implements Comparable but should instead implement Comparable<Student>. Likewise the parameter of its compareTo() method should be Student instead of Object and the initial assignment with typecasting is not required.

Searching in a Binary Search Tree

Suppose you wanted to search for a target value in a binary search tree. How would you proceed?

Compare target to value stored at root.
If target is equal to value stored at root, the search is over and successful.
If target is less than value stored at root, then the target must be contained in the root's left subtree, if it is in the tree at all. This is by definition of the ordering property.
Similarly, if target is greater than value stored at root, then the target must be contained in root's right subtree, if it is in the tree.
Therefore, the next comparison will be to the root's left or right child. These nodes are themselves the root of a subtree, so the search process is repeated recursively starting with step 1.
Recursion stops when either the target value is found or the subtree root is null.

How many comparisons are needed in the worst case (target not found)? It depends on the tree structure!

If the root happens to contain the smallest value and its right child contains the second-smallest value, and its right child contains the third-smallest value, etc, then the tree is unbalanced and the search process is linear, e.g. O(n).
If the tree is balanced, e.g. is complete or nearly so, then its depth is about log₂n levels. Since the search descends one level per comparison, the maximum number of comparisons is equal to the tree depth, or O(log₂n).
It is therefore important that the tree be balanced.
Later we will study techniques for assuring tree balance.

Specification of `BinarySearchTree` Class

Java Collections Framework does not have a binary search tree class, but it is incorporated into tree-backed JCF classes TreeSet and TreeMap
Textbook defines: public class BinarySearchTree<E> extends AbstractSet<E>
AbstractSet itself implements Set, and "provides a skeletal implementation of the Set interface to minimize the effort required to implement this interface." (quoted from API)
Note that as a set, it does not permit duplicate (as determined by equals()) elements.
The constructors and methods of interest in the textbook include:

public BinarySearchTree()
public BinarySearchTree(BinarySearchTree<E> otherTree)
public int size()
public Iterator<E> iterator()
public boolean contains(Object obj)
public boolean add(E element)
public boolean remove(Object obj)

Desired running time: constant for size(), logarithmic for the others
All except the constructors are specified in the Set interface

Implementation of `BinarySearchTree` Class

As you might expect, having studied the linked list classes, there are two inner classes

Entry class to represent elements

reference to element
reference to parent Entry
reference to left child Entry
reference to right child Entry
a constructor
Note it is defined as a static class; it neither requires a BinarySearchTree object nor has access to non-static BinarySearchTree members

TreeIterator class to implement iteration over a binary tree

Note that iterations are sequential (linear) and trees are not!
Iteration will be defined as in-order traversal, which for a BST is element's natural order
Constructor positions iterator to "furthest left Entry in left subtree"
next() needs to advance position to the successor Entry (details later)
remove() removes Entry most recently returned by next()

The contains() method needs to search for matching element.

Should it use a iterator, or should it conduct a "binary search" starting at the root?
Consider recursive binary search; can the contains() method itself be recursive? Hint: Each recursive call must be given an Entry that serves as a subtree root.
Sketch out a solution
Text shows both an iterative and a recursive version.
Collection Framework methods tend to be iterative rather than recursive for performance reasons.

The add() method must assure that upon completion the tree still has the ordering property.
The remove() method must assure that upon completion the tree still has the ordering property.
Details for these three methods follow.

Implementing the `contains()` and `add()` methods

Note the similarity in these methods:

The binary search strategy used by contains() to find a target value.
If the target is not in the tree, the search terminates at the position where it would be, if it were present.
The add() method assumes the new element is not in the tree and finds the position where it should be placed.
Thus add() should follow the same general algorithm as contains().

The algorithm can be described as follows:

Start with the root of the tree.
If the root is null, we are at the final position (add the element or terminate the unsuccessful search). Otherwise, continue on to step 3.
Compare the new/target element with the element stored at the root.

If the new/target element is less than the root, then we need to travel down its left subtree to continue our search.
If the new/target element is greater than the root, then we need to travel down its right subtree to continue our search.
If the new/target element is equal to the root, then the search is over (error for add(), success for contains()).

To continue the search, set the left or right child (as chosen in step 3) to be the root then go back to step 2.

Implementing the `remove()` method

Its parameter is the element, not the Entry

First task is to find that element's Entry
contains() conducts the search but does not return the Entry. What to do?

Copy/paste the search algorithm from contains()? Easy, but...
Encapsulate the algorithm into a protected/private method, then have both contains() and remove() call it?

The second approach is preferred.

It exemplifies single-point control
A.k.a. "Write once, use many times."
Simple example of single-point control in first column of this table.
If array size needs to be changed, the change is made in one place:
final int LIMIT = 100; int[] scores = new int[LIMIT]; . . . for (i=0; i<scores.size; i++) // poor programming practice int[] scores = new int[100]; . . . for (i=0; i<100; i++)
Other examples include modular programming (write a method/function once and use it all over the place), OOP itself (encapsulation in particular), and table driven control.
It is a major component of the practice of refactoring. See the book Refactoring by Martin Fowler.
The major goal of single-point control and refactoring is to make software more maintainable.
My first exposure to the term "single-point control" comes from the "Dear Professor..." article by Robert Glass (inventor of Visual Basic) in April 1989 Software magazine.
Google search "single point control" Glass, led to the article at http://findarticles.com/p/articles/mi_m0SMG/is_n5_v9/ai_7242466

Textbook follows this approach by defining protected Entry<E> getEntry(Object obj) and rewriting contains() to call it.

Once you have a reference to the Entry, you can proceed to remove it.
This is reasonably complex due to the different possibilities

Entry E being removed is a leaf
Entry E being removed has one child
Entry E being removed has two children

Entry E being removed is a leaf

If E is the tree's root, set new root to null (empty tree)
If E is not the tree's root, replace the reference E's parent has to E with null.

Entry E being removed has one child

If E is the tree's root, set new root to be E's non-null child.
If E is not the tree's root, replace the reference E's parent has to E with E's reference to its non-null child. (adoption)

Entry E being removed has two children

Locate E's successor S (leftmost Entry in E's right subtree), replace E's element with S's element, then recursively remove S.
Note that E's successor S cannot itself have two children! Why not? So the removal of S falls into the first or second case.
Note that E's successor S cannot be the root.

Removal operation is also needed by TreeIterator's remove() method, and so is defined in protected Entry<E> deleteEntry(Entry<E> p) and called by both methods.

Finding the successor to a given Entry

This is needed both by remove() and by TreeIterator's next()
Since it is needed multiple places it is defined in protected Entry<E> successor(Entry<E> e)
There are three relevant situations for getting the successor to Entry E

E is null
E has a right child
E does not have a right child

If E is null, its successor S is null.
If E has a right child, its successor S will be the leftmost Entry in E's right subtree (smallest value greater than E's value)
If E does not have a right child, its successor S is found by tracing backward through its ancestors P until P is itself a left child; P's parent will be the successor to E.

I have presented natural language descriptions of these algorithms; see textbook for source code

[ C SC 205 | Peter Sanderson | Math Sciences server | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)

Wednesday October 31 (week 8)

The Ordering Property and Binary Search Trees

The Comparable interface

Searching in a Binary Search Tree

Specification of BinarySearchTree Class

Implementation of BinarySearchTree Class

Implementing the contains() and add() methods