Section 12.2
List and Set Classes
IN THE PREVIOUS SECTION, we looked at the general properties
of collection classes in Java. In this section, we look at some specific collection
classes and how to use them. These classes can be divided into two
categories: lists and sets. A list consists of a sequence
of items arranged in a linear order. A list has a definite order, but is not necessarily
sorted into ascending order. A set is a collection that has no duplicate entries.
The elements of a set might or might not be arranged into some definite order. As with
all of Java's collection classes, the items in a list or set are of type Object.
The ListArray and LinkedList Classes
There are two obvious ways to represent a list: as a dynamic array and as
a linked list. We've encountered these already in Sections 8.3 and
11.2. Both of these options are available in generic
form as the collection classes java.util.ListArray and java.util.LinkedList. That is,
a ListArray represents an ordered sequence of objects stored in an array that will
grow in size as new items are added, and a LinkedList represents an ordered sequence
of objects stored in nodes that are linked together with pointers. Both of these
classes implement an interface java.util.List, which specifies operations that
are available for all lists.
Both list classes support the basic list operations, and an abstract data type is
defined by its operations, not by its representation. So why two classes? Why not a
single List class with a single representation? The problem is that there is
no single representation of lists for which all list operations are efficient. For
some operations, linked lists are more efficient than arrays. For others, arrays are
more efficient. In a particular application of lists, it's likely that only a few
operations will be used frequently. You want to choose the representation for which
the frequently used operations will be as efficient as possible.
Broadly speaking, the LinkedList class is more efficient in applications where
items will often be added or removed at the beginning of the list or in the middle of the list.
In an array, these operations require moving a large number of items up or down one
position in the array, to make a space for a new item or to fill in the hole left by
the removal of an item. In a linked list, nodes can be added or removed at any position
by changing a few pointer values. The ArrayList class is more efficient when
random access to items is required. Random access means
accessing the n-th item in the list, for any integer n. This is trivial for an array,
but for a linked list it means starting at the beginning of the list and moving from
node to node along the list for n steps. Operations that can be done efficiently for
both types of lists include sorting and adding an item at the end of the list.
All lists implement the Collection methods discussed in the
previous section, including size(), isEmpty(),
add(Object), remove(Object), and clear(). The
add(Object) method adds the object at the end of the list. The remove(Object)
method involves first finding the object, which is not very efficient for any list
since it involves going through the items in the list from beginning to end until the
object is found. The List interface adds some methods for accessing list
items according to their numerical positions in the list. For an object, list,
of type List, these methods include:
- list.get(index) -- returns the Object
at position index in the list, where index is an integer. Items
are numbered 0, 1, 2, ..., list.size()-1. The parameter must be in this
range, or an IndexOutOfBoundsException is thrown.
- list.set(index,obj) -- stores an object obj
at position number index in the list, replacing the object that was there
previously. This does not change the number of elements in the list or move any
of the other elements.
- list.add(index,obj) -- inserts an object obj
into the list at position number index. The number of items in the list
increases by one, and items that come after position index move up one
position to make room for the new item. The value of index can be
in the range 0 to list.size(), inclusive.
- list.remove(index) -- removes the object at
position number index. Items after this position move up one space in the
list to fill the hole.
- list.indexOf(obj) -- returns an int
that gives the position of obj in the list, if it occurs. If it does not
occur, the return value is -1. If obj occurs more than once in the
list, the index of the first occurrence is returned.
These methods are defined in both the ArrayList class and in the LinkedList
class, although they are only efficient for ArrayLists.
The LinkedList class adds a few additional methods, which are not defined
for an ArrayList. If linkedlist is an object of type LinkedList,
then
- linkedlist.getFirst() -- returns the Object
that is the first item in the list. The list is not modified.
- linkedlist.getLast() -- returns the Object
that is the last item in the list. The list is not modified.
- linkedlist.removeFirst() -- removes the first item
from the list, and returns that Object as its return value.
- linkedlist.removeLast() -- removes the last item
from the list, and returns that Object as its return value.
- linkedlist.addFirst(obj) -- adds the Object, obj,
to the beginning of the list.
- linkedlist.addLast(obj) -- adds the Object, obj,
to the end of the list. (This is exactly the same as linkedlist.add(obj) and is apparently
defined just to keep the naming consistent.)
These methods are apparently defined to make it easy to use a LinkedList as
if it were a stack or a queue. (See Section 11.3.) For
example, we can use a LinkedList as a queue by adding items onto one
end of the list (using the addLast() method) and removing them from the
other end (using the removeFirst() method).
If list is an object of type List, then the method
list.iterator(), defined in the Collection interface,
returns an Iterator that can be used to traverse the list from
beginning to end. However, for Lists, there is a special type
of Iterator, called a ListIterator, which offers additional
capabilities. The method list.listIterator() returns a ListIterator
for list.
A ListIterator has the usual Iterator methods
hasNext() and next(), but it also has methods hasPrevious()
and previous() that make it possible to move backwards in the list.
To understand how these work, its best to think of an iterator as pointing
to a position between two list elements, or at the beginning or
end of the list. In this diagram, the items in a list are represented by squares,
and arrows indicate the possible positions of an iterator:
If iter is a ListIterator(), iter.next() moves the
iterator one space to the right along the list and returns the item that the iterator passes
as it moves. The method iter.previous() moves the iterator one space
to the left along the list and returns the item that it passes. The method
iter.remove() removes an item from the list; the item that is removed is the item that
the iterator passed most recently in a call to either iter.next() or iter.previous().
There is also a
method iter.add(Object) that adds the specified object to the list
at the current position of the iterator. This can be between two existing
items or at the beginning of the list or at the end of the list.
(By the way, the lists that are used in the LinkedList class are
doubly linked lists. That is, each node in the
list contains two pointers -- one to the next node in the list and one to
the previous node. This makes it possible to implement effeciently
both the next() and previous() methods of a ListIterator.
Also, to make the addLast() and getLast() methods of a LinkedList
efficient, the LinkedList class includes an instance variable that points
to the last node in the list.)
As an example of using a ListIterator, suppose that we want to maintain a list of items that
is always sorted into increasing order. When adding an item to the list,
we can use a ListIterator to find the position in the list
where the item should be added. The idea is to start at the beginning
of the list and to move the iterator forward past all the items that
are bigger than the item that is being inserted. At that point, the
iterator's add() method can be used to insert the item at its
correct position in the list. In order to say
what it means for one item to be "bigger" than another, we assume
that the items in the list implement the Comparable interface
and define the compareTo() method. (This interface was
discussed in the previous section.)
Here is a method that will do this:
static void orderedInsert(List list, Comparable newItem) {
// Precondition: The items in list are sorted into ascending
// order, according to the compareTo method.
// newitem.compareTo(item) must be defined for
// each item in the list.
//
// Postcondition: newItem has been added to the list in its
// correct position, so that the list is still
// sorted into ascending order.
ListIterator iter = list.listIterator();
// Move the iterator so that it points to the position where
// newItem should be inserted into the list. If newItem is
// bigger than all the items in the list, then the while loop
// will end when iter.hasNext() becomes false, that is, when
// the iterator has reached the end of the list.
while (iter.hasNext()) {
Object item = iter.next();
if (newItem.compareTo(item) <= 0) {
// newItem should come BEFORE item in the list.
// Move the iterator back one space so that
// it points to the correct insertion point,
// and end the loop.
iter.previous();
break;
}
}
iter.add(newItem);
} // end orderedInsert()
Since the parameter in this method is of type List, it can be applied
to both ArrayLists and LinkedLists, and it will be about equally
efficient for both types of lists. You would probably find it easier to write
an orderedInsert method using array-like indexing with the methods
get(index) and add(index,obj). However, that method would
be horribly inefficient for LinkedLists because get(index) is
so inefficient for such lists. You can find a program that tests
this method in the file ListInsert.java.
Sorting
Sorting a list is a fairly common operation, and there should really be a
sorting method in the List interface. For some reason, there is not,
but methods for sorting Lists are available as static
methods in the class java.util.Collections. This class contains
a variety of static utility methods for working with collections. The command
Collections.sort(list);
can be used to sort a list into ascending order. The items in the list
must implement the Comparable interface. This method will work, for
example, for lists of Strings. If a Comparator is provided
as a second argument:
Collections.sort(list,comparator);
then the comparator will be used to compare the items in the list.
As mentioned in the previous section, a Comparator is an object
that defines a compare() method that can be used to compare
two objects. We'll see an example of using a Comparator in
Section 4.
The Collections class has at least two other useful methods
for modifying lists. Collections.shuffle(list) will rearrange
the elements of the list into a random order. Collections.reverse(list)
will reverse the order of the elements, so that the last element is moved
to the beginning of the list, the next-to-last element to the second position,
and so on.
Since an efficient sorting method is provided for Lists, there is
no need to write one yourself. You might be wondering whether there is an
equally convenient method for standard arrays. The answer is yes.
Array-sorting methods are available as static methods in the class
java.util.Arrays. The command:
Arrays.sort(A);
will sort an array, A, provided either that the base type of
A is one of the primitive types (except boolean) or that A is an array
of Objects that implement the Comparable interface.
You can also sort part of an array. This is important since arrays
are often only "partially filled." The command:
Arrays.sort(A,fromIndex,toIndex);
sorts the elements A[fromIndex], A[fromIndex+1], ..., A[toIndex-1]
into ascending order. You can use Arrays.sort(A,0,N) to sort
a partially filled array which has elements in the first N positions.
Java does not support generic programming for primitive types.
In order to implement the command Arrays.sort(A), the Arrays
class contains eight methods: one method for arrays of Objects and
one method for each of the primitive types byte, short, int,
long, float, double, and char.
The TreeSet and HashSet Classes
A set is a collection of Objects in which no object occurs
more than once. Objects obj1 and obj2 are considered
to be the same if obj1.equals(obj2) is true, as discussed in
the previous section. Sets implement all the general Collection
methods, but do so in a way that ensures that no element occurs twice
in the set. For example, if set is an object of type Set,
then set.add(obj) will have no effect on the set if obj is
already an element of the set. Java has two classes that implement the
Set interface: java.util.TreeSet and java.util.HashSet.
In addition to being a Set, a TreeSet has the property
that the elements of the set are arranged into ascending sorted order.
An Iterator for a TreeSet will always visit the elements
of the set in ascending order.
A TreeSet cannot hold arbitrary objects, since
there must be a way to determine the sorted order of the objects it contains.
Ordinarily, this means that the objects in a TreeSet should implement
the Comparable interface and that obj1.compareTo(obj2) should
be defined in a reasonable way for any two objects obj1 and obj2
in the set. Alternatively, a Comparator can be provided as a parameter
to the constructor when the TreeSet is created. In that case, the
Comparator will be used to compare objects that are added to the set.
In the implementation of a TreeSet, the elements are stored in
something like a binary sort tree. (See Section 11.4.)
The actually type of tree that is used is balanced in
the sense that all the leaves of the tree are at about the same distance from the
root of the tree. The number of operations required to find an item in a sorted tree
is the same as the distance from the root of the tree to the item. Using a balanced
tree ensures that all items are as close to the root as possible. This makes
finding an item very efficient. Adding and removing elements are equally efficient.
The fact that a TreeSet sorts its elements and removes duplicates
makes it very useful in some applications. In Section 10.3,
I presented a program, WordList.java, that
reads all the words in a file and outputs a list of the words it found. The list
is sorted and duplicates have been removed. In that program, I used a linked list to
store the words and had to write a subroutine to make sure that the list was
sorted and contained no duplicates. By using a TreeSet instead of a list,
that part of the programming is taken care of automatically. An algorithm for
the program, using a TreeSet, would be:
TreeSet words = new TreeSet();
while there is more data in the input file:
Let word = the next word from the file.
words.add(word);
Iterator iter = words.iterator();
while (iter.hasNext()):
Write iter.next() to the output file.
If you would like to see a complete, working program, you can find it
in the file WordListWithTreeSet.java.
As another example, suppose that coll is any Collection of
Strings (or any other type for which compareTo() is properly
defined). We can use a TreeSet to sort the items of coll and
remove the duplicates simply by saying:
TreeSet set = new TreeSet();
set.addAll(coll);
The second statement adds all the elements of the collection to the
set. Since it's a Set, duplicates are ignored. Since it's a
TreeSet, the elements of the set are sorted. If you would like
to have the data in some other type of data structure, it's easy to copy the
data from the set. For example, to place the answer in an ArrayList,
you could say:
TreeSet set = new TreeSet();
set.addAll(coll);
ArrayList list = new ArrayList();
list.addAll(set);
Now, in fact, every one of Java's collection classes has a constructor
that takes a Collection as an argument. All the items in that
Collection are added to the new collection when it is created.
So, new TreeSet(coll) creates a TreeSet
that contains the same elements as the Collection, coll.
This means that we can abbreviate the four lines in the above example
to the single command:
ArrayList list = new ArrayList( new TreeSet(coll) );
This makes a sorted list of the elements of coll with no duplicates.
A nice example of the power of generic programming. (It seems, by the way,
there is no equally easy way to get the list with duplicates. To do
this, we would need something like a TreeSet that allows duplicates.
The C++ programming language has such a thing and refers to it as a
multiset. The Smalltalk language has something
similar and calls it a bag. Java, for the time
being at least, lacks this data type.)
A HashSet stores its elements in a hash table,
a type of data structure that I will discuss in the next section.
The operations of finding, adding, and removing elements are implemented very
efficiently in hash tables, even more so than for TreeSets. The elements
of a HashSet are not stored in any particular order. An Iterator
for a HashSet will visit its elements in what seems to be a completely
arbitrary order, and it's possible for the order to change if a new element is
added. Because the elements of a HashSet are not ordered, they do
not have to implement the Comparable interface. Use a HashSet
instead of a TreeSet when the elements it contains are not comparable,
or when the order is not important, or when the small advantage in efficiency
is important.