C SC 205 Lecture 14: Binary Search Trees
major resources: Data Structures and the Java Collections Framework Second Edition,
William Collins, McGraw-Hill, 2005
Introduction to Programming and OO Design
using Java, Niño and Hosch, Wiley & Sons, 2002
[ previous
| schedule
| next ]
Wednesday October 31 (week 8)
The Ordering Property and Binary Search Trees
- A binary tree has the ordering property if, for
every parent node in the tree, its left child has a smaller value and its right child
has a larger value.
- The above assumes there are no duplicate values in the tree. Multiple elements with equal
value can be easily accommodated by either defining left child to be "smaller or equal" OR defining right child to be
"larger or equal"
- A binary tree having the ordering property is known as a binary search tree
- Here's a recursive definition: A tree t rooted at root is a binary search tree if either
t is empty OR (root's left child has a smaller value AND root's right child has a larger value
AND both of root's children are the roots of binary search trees)
- This requires that a natural ordering is defined for the element types
- In Java, natural orders are defined by implementing the Comparable<T> interface
- Binary search trees can be searched in logarithmic time, assuming they are balanced!
- A balanced tree has roughly the same number of left children and right children.
- Interesting fact: An in-order traversal of a binary search tree
will visit the nodes by increasing order of element value!
The Comparable interface
- Defined in java.lang so you can refer to it without importing
- Has only one method: int compareTo(T obj)
- Use it to compare the invoking object to the argument object. To clarify, in the
expression a.compareTo(b) , a is the invoking object and b is the argument object.
- The return value is determined as follows:
- Returns an int value less than 0 if the invoking object is less than the argument object
- Returns 0 if the invoking object is equal to the argument object
- Returns an int value greater than 0 if the invoking object is greater than the argument object
- The method programmer determines what it means to be less than, equal to, or greater than. This imposes a natural ordering over objects of the implementing class
- Guidelines state that it should return 0 if and only if equals() of the same objects returns true.
- NOTE: If you are working an element class that does not implement Comparable,
you can alternatively use a Comparator object to do the job.
- java.util.Comparator<T> is the interface to implement.
- int compare(<T> obj1, <T> obj2) is the method to define.
- Returns value < 0 if obj1 is less than obj2
- Returns 0 if obj1 is equal to obj2 (obj1.equals(obj2) recommended but not required)
- Returns value > 0 if obj1 is greater than obj2
- See the Collections class static sort() method for an example. The overloaded sort()
has an extra parameter for a Comparator, which will be used regardless of whether the list elements are
Comparable or not.
- NOTE: The Student class as defined in my textbook (first printing of second edition) pages 392-393 is pre-Java 5.0; it
implements Comparable but should instead implement Comparable<Student>. Likewise
the parameter of its compareTo() method should be Student instead of Object and
the initial assignment with typecasting is not required.
Searching in a Binary Search Tree
- Suppose you wanted to search for a target value in a binary search tree. How would you proceed?
- Compare target to value stored at root.
- If target is equal to value stored at root, the search is over and successful.
- If target is less than value stored at root, then the target must be contained in the root's
left subtree, if it is in the tree at all. This is by definition of the ordering property.
- Similarly, if target is greater than value stored at root, then the target must be contained in
root's right subtree, if it is in the tree.
- Therefore, the next comparison will be to the root's left or right child. These nodes are themselves
the root of a subtree, so the search process is repeated recursively starting with step 1.
- Recursion stops when either the target value is found or the subtree root is null.
- How many comparisons are needed in the worst case (target not found)? It depends on the
tree structure!
- If the root happens to contain the smallest value and its right child contains
the second-smallest value, and its right child contains the third-smallest value, etc, then the
tree is unbalanced and the search process is linear, e.g. O(n).
- If the tree is balanced, e.g. is complete or nearly so, then its depth is about
log2n levels. Since the search descends one level per comparison, the maximum number
of comparisons is equal to the tree depth, or O(log2n).
- It is therefore important that the tree be balanced.
- Later we will study techniques for assuring tree balance.
Specification of BinarySearchTree Class
- Java Collections Framework does not have a binary search tree class, but it is incorporated
into tree-backed JCF classes TreeSet
and TreeMap
- Textbook defines: public class BinarySearchTree<E>
extends AbstractSet<E>
- AbstractSet itself implements Set, and "provides
a skeletal implementation of the Set interface to minimize the effort required to implement this interface." (quoted from API)
- Note that as a set, it does not permit duplicate (as determined by equals()) elements.
- The constructors and methods of interest in the textbook include:
- public BinarySearchTree()
- public BinarySearchTree(BinarySearchTree<E> otherTree)
- public int size()
- public Iterator<E> iterator()
- public boolean contains(Object obj)
- public boolean add(E element)
- public boolean remove(Object obj)
- Desired running time: constant for size(), logarithmic for the others
- All except the constructors are specified in the Set interface
Implementation of BinarySearchTree Class
- As you might expect, having studied the linked list classes, there are two inner classes
- Entry class to represent elements
- reference to element
- reference to parent Entry
- reference to left child Entry
- reference to right child Entry
- a constructor
- Note it is defined as a static class; it neither requires a BinarySearchTree
object nor has access to non-static BinarySearchTree members
- TreeIterator class to implement iteration over a binary tree
- Note that iterations are sequential (linear) and trees are not!
- Iteration will be defined as in-order traversal, which for a BST is element's natural order
- Constructor positions iterator to "furthest left Entry in left subtree"
- next() needs to advance position to the successor Entry (details later)
- remove() removes Entry most recently returned by next()
- The contains() method needs to search for matching element.
- Should it use a iterator, or should it conduct a "binary search" starting at the root?
- Consider recursive binary search; can the contains() method itself be recursive?
Hint: Each recursive call must be given an Entry that serves as a subtree root.
- Sketch out a solution
- Text shows both an iterative and a recursive version.
- Collection Framework methods tend to be iterative rather than recursive for performance reasons.
- The add() method must assure that upon completion the tree still has the ordering property.
- The remove() method must assure that upon completion the tree still has the ordering property.
- Details for these three methods follow.
Implementing the contains() and add() methods
- Note the similarity in these methods:
- The binary search strategy used by contains()
to find a target value.
- If the target is not in the tree, the search terminates at the position where
it would be, if it were present.
- The add() method assumes the new element is not in the tree and finds the position where
it should be placed.
- Thus add() should follow the same general algorithm as contains().
- The algorithm can be described as follows:
- Start with the root of the tree.
- If the root is null, we are at the final position (add the element or terminate the unsuccessful search). Otherwise,
continue on to step 3.
- Compare the new/target element with the element stored at the root.
- If the new/target element is less than the root, then we need to travel down
its left subtree to continue our search.
- If the new/target element is greater than the root, then we need to travel down
its right subtree to continue our search.
- If the new/target element is equal to the root, then the search is over (error for add(),
success for contains()).
- To continue the search, set the left or right child (as chosen in step 3) to be the root then go back to step 2.
Implementing the remove() method
- Its parameter is the element, not the Entry
- First task is to find that element's Entry
- contains() conducts the search but does not return the Entry. What to do?
- Copy/paste the search algorithm from contains()? Easy, but...
- Encapsulate the algorithm into a protected/private method, then have both contains() and remove() call it?
- The second approach is preferred.
- It exemplifies single-point control
- A.k.a. "Write once, use many times."
- Simple example of single-point control in first column of this table.
If array size needs to be changed, the change is made in one place:
final int LIMIT = 100; int[] scores = new int[LIMIT]; . . . for (i=0; i<scores.size; i++)
|
// poor programming practiceint[] scores = new int[100]; . . . for (i=0; i<100; i++)
|
- Other examples include modular programming (write a method/function once and use it all over the place),
OOP itself (encapsulation in particular), and table driven control.
- It is a major component of the practice of refactoring. See the book Refactoring by Martin Fowler.
- The major goal of single-point control and refactoring is to make software more maintainable.
- My first exposure to the term "single-point control" comes from the "Dear Professor..." article by Robert Glass (inventor of Visual Basic) in April 1989 Software magazine.
- Google search "single point control" Glass, led to the article at
http://findarticles.com/p/articles/mi_m0SMG/is_n5_v9/ai_7242466
- Textbook follows this approach by defining protected Entry<E> getEntry(Object obj) and
rewriting contains() to call it.
- Once you have a reference to the Entry, you can proceed to remove it.
- This is reasonably complex due to the different possibilities
- Entry E being removed is a leaf
- Entry E being removed has one child
- Entry E being removed has two children
- Entry E being removed is a leaf
- If E is the tree's root, set new root to null (empty tree)
- If E is not the tree's root, replace the reference E's parent has to E with null.
- Entry E being removed has one child
- If E is the tree's root, set new root to be E's non-null child.
- If E is not the tree's root, replace the reference E's parent has to E with E's reference to its non-null child. (adoption)
- Entry E being removed has two children
- Locate E's successor S (leftmost Entry in E's right subtree), replace E's element with S's element, then recursively remove S.
- Note that E's successor S cannot itself have two children! Why not? So the removal of S falls into the first or second case.
- Note that E's successor S cannot be the root.
- Removal operation is also needed by TreeIterator's remove() method, and so is
defined in protected Entry<E> deleteEntry(Entry<E> p) and called by both methods.
Finding the successor to a given Entry
- This is needed both by remove() and by TreeIterator's next()
- Since it is needed multiple places it is defined in protected Entry<E> successor(Entry<E> e)
- There are three relevant situations for getting the successor to Entry E
- E is null
- E has a right child
- E does not have a right child
- If E is null, its successor S is null.
- If E has a right child, its successor S will be the leftmost Entry in E's right subtree (smallest value greater than E's value)
- If E does not have a right child, its successor S is found by tracing backward through its ancestors P until
P is itself a left child; P's parent will be the successor to E.
I have presented natural language descriptions of these algorithms; see textbook for source code
[ C
SC 205 | Peter
Sanderson | Math Sciences server
| Math Sciences home page
| Otterbein ]
Last updated:
Peter Sanderson (PSanderson@otterbein.edu)