Section 12.1
Generic Programming
GENERIC PROGRAMMING refers to writing code that will work
for many types of data. We encountered the term in Section 8.3,
where we looked at dynamic arrays of integers. The source code presented there for working with
dynamic arrays of integers works only for data of type int. But the source code
for dynamic arrays of double, String, JButton, or any other
type would be almost identical. It seems silly to write essentially the same code over
and over. As we saw in Section 8.3, Java goes some distance towards solving this
problem by providing the ArrayList class. An ArrayList is essentially
a dynamic array of values of type Object. Since every class is a sub-class of
Object, objects belonging to any class can be stored in an ArrayList.
This is an example of generic programming: The source code for the ArrayList
class was written once, and it works for objects of any type. (It does, not, however,
work for data belonging to the primitive types, such as int and double.)
The ArrayList class is just one of several classes and interfaces
that are used for generic
programming in Java. We will spend this chapter looking at these classes and how they
are used. All the classes discussed in this chapter are defined in the package
java.util, and you will need an import statement at the beginning of your
program to get access to them. (Before you start putting import java.util.*
at the beginning of every program, you should know that some things in java.util have
names that are the same as things in other packages. For example, both java.util.List
and java.awt.List exist.)
It is no easy task to design a library for generic programming. Java's solution
has many nice features but is certainly not the only possible approach. It is almost
certainly not the best, but in the context of the overall design of Java,
it might be close to optimal. To get some perspective on generic programming
in general, it might be useful to look very briefly at generic
programming in two other languages.
Generic Programming in Smalltalk
Smalltalk was one of the very first object-oriented programming languages.
It is still used today. Although it has not achieved anything like the popularity
of Java or C++, it is the source of many ideas used in these languages. In
Smalltalk, essentially all programming is generic, because of two basic properties
of the language.
First of all, variables in Smalltalk are typeless. A data value has a type,
such as integer or string, but variables do not have types. Any variable can
hold data of any type. Parameters are also typeless, so a subroutine can
be applied to parameter values of any type. Similarly, a data structure
can hold data values of any type. For example, once you've defined a
binary tree data structure in SmallTalk, you can use it for binary trees
of integers or strings or dates or data of any other type. There is simply
no need to write new code for each data type.
Secondly, all data values are objects, and all operations on objects
are defined by methods in a class. This is true even for types that
are "primitive" in Java, such as integers. When the "+" operator is
used to add two integers, the operation is performed by calling a
method in the integer class. When you define a new class, you can
define a "+" operator, and you will then be able to add objects belonging
to that class by saying "a + b" just as if you were adding
numbers. Now, suppose that you write a subroutine that uses the "+" operator
to add up the items in a list. The subroutine can be applied to a list of
integers, but it can also be applied, automatically, to any other data
type for which "+" is defined. Similarly, a subroutine that uses the
"<" operator to sort a list can be applied to lists containing any
type of data for which "<" is defined. There is no need to write
a different sorting subroutine for each type of data.
Put these two features together and you have a language where data
structures and algorithms will work for any type of data for which
they make sense, that is, for which the appropriate operations are
defined. This is real generic programming.
This might sound pretty good, and you might be asking yourself why
all programming languages don't work this way. This type of freedom
makes it easier to write programs, but unfortunately it makes it
harder to write programs that are correct and robust.
(See Chapter 9.)
Once you have a data structure that can contain data of any type, it becomes
hard to ensure that it only holds the type of data that you want it to
hold. If you have a subroutine that can sort any type of data,
it's hard to ensure that it will only be applied to data for which
the "<" operator is defined. More particularly, there is no
way for a compiler to ensure these things. The problem
will show up at run time when an attempt is made to apply some operation
to a data type for which it is not defined, and the program will
crash.
Generic Programming in C++
Unlike Smalltalk, C++ is a very strongly typed language, even more so
than Java. Every variable has a type, and can only hold data values of
that type. This means that the type of generic programming used in
Smalltalk is impossible. Furthermore, C++ does not have anything
corresponding to Java's Object class. That is, there is
no class that is a superclass of all other classes. This means that
C++ can't use Java's style of generic programming either.
Nevertheless, C++ has a powerful and flexible system of generic
programming. It is made possible by a language feature known as
templates. In C++, instead of
writing a different sorting subroutine for each type of data, you
can write a single subroutine template. The template is not
a subroutine; it's more like a factory for making subroutines.
We can look at an example, since the syntax of C++ is very
similar to Java's:
template<class ItemType>
void sort( ItemType A[], int count ) {
// Sort count items in the array, A, into increasing order.
// The algorithm that is used here is selection sort.
for (int i = count-1; i > 0; i--) {
int position_of_max = 0;
for (int j = 1; j <= count ; j++)
if ( A[j] > A[position_of_max] )
position_of_max = j;
ItemType temp = A[count];
A[count] = A[position_of_max];
A[position_of_max] = temp;
}
}
This piece of code defines a subroutine template. If you remove the
first line, "template<class ItemType>", and substitute the
word "int" for the word "ItemType" in the rest of the template,
you get a subroutine for sorting arrays of ints. (Even though it says
"class ItemType", you can actually substitute any type for ItemType,
including the primitive types.) If you substitute
"string" for "ItemType", you get a subroutine for sorting arrays of
strings. This is pretty much what the compiler does with the template.
If your program says "sort(list,10)" where list is an array of ints,
the compiler uses the template to generate a subroutine for sorting
arrays of ints. If you say "sort(cards,10)" where cards is an
array of objects of type Card, then the compiler generates a subroutine
for sorting arrays of Cards. At least, it tries to. The template uses
the ">" operator to compare values. If this operator is defined for values
of type Card, then the compiler will successfully use the template
to generate a subroutine for sorting Cards. If ">" is not defined
for Cards, then the compiler will fail -- but this will happen at
compile time, not, as in Smalltalk, at run time where it would make the
program crash.
C++ also has templates for making classes. If you write a template
for binary trees, you can use it to generate classes for binary trees of
ints, binary trees of strings, binary trees of dates, and so on -- all from
one template. The most recent version of C++ comes with a large number
of pre-written templates called the Standard
Template Library or STL. The STL is quite complex. Many people
would say that its much too complex. But it is also one of the most
interesting features of C++.
Generic Programming in Java
Like C++, Java is a strongly typed language. However, generic programming
in Java is closer in spirit to Smalltalk than it is to C++.
As I've already noted, generic programming in Java is based on the
fact that class Object is a superclass of every other class.
To some extent, this makes Java similar to Smalltalk: A data structure
designed to hold Objects can hold values belonging to any class.
There is no need for templates or any other new language feature to
support generic programming.
Of course, primitive type values, such as integers, are not objects
in Java and therefor cannot be stored in generic data structures. In fact,
there is no way to do generic programming with the primitive data types
in Java. The Smalltalk approach doesn't work except for objects, and
the C++ approach is not available. Furthermore, generic subroutines are
more problematic in Java than they are in either Smalltalk or C++.
In Smalltalk, a subroutine can be called with parameter values of any
type, and it will work fine as long as all the operations used by the
subroutine are supported by the actual parameters. In Java, parameters
to a subroutine must be of a specified type, and the subroutine can only
use operations that are defined for that type. A subroutine with a parameter
of type Object can be applied to objects of any type, but the
subroutine can only use operations that are defined in class Object,
and there aren't many of those! For example, there is no comparison operation
defined in the Object class, so it is not possible to write a
completely generic sorting algorithm. We'll see below how Java addresses
this problem.
Because of problems like these, some people (including myself) claim that
Java does not really support true generic programming. Other people disagree.
But whether it's true generic programming or not, that doesn't prevent it
from being very useful.
Collections and Maps
Java's generic data structures can be divided into two categories:
collections and maps.
A collection is more or less what it sound like: a collection of objects.
A map associates objects in one set with objects in another set in the
way that a dictionary associates definitions with words or a phone book
associates phone numbers with names. A map is similar to what I called
an "association list" in Section 8.4.
There are two types of collections: lists
and sets. A list is a collection in which
the objects are arranged in a linear sequence. A list has a first item,
a second item, and so on. For any item in the list, except the last,
there is an item that directly follows it. A set is a collection in
which no object can appear more than once.
Note that the terms "collection," "list," "set," and "map" tell you
nothing about how the data is stored. A list could be represented as an
array, as a linked list, or, for that matter, as a map that associates
the elements of the list to the numbers 0, 1, 2, ....
In fact, these terms are represented in Java not by classes but
by interfaces. The interfaces Collection, List,
Set, and Map specify the basic operations on data
structures of these types, but do not specify how the data structures
are to be represented or how the operations are to be implemented.
That will be specified in the classes that implement the interfaces.
Even when you use these classes, you might not know what the implementation is unless you
go look at the source code. Java's generic data structures are
abstract data types. They are defined
by the operations that can be performed on them, not by the physical
layout of the data in the computer.
We will look at list and set classes in Section 2
and map classes in Section 3. But before we do
that, we'll look briefly at some of the general operations that are
available for all collections.
Generic Algorithms and Iterators
The Collection interface includes methods for performing some
basic operations on collections of objects. Since "collection" is a very
general concept, operations that can be applied to all collections are
also very general. They are generic operations in the sense that they
can be applied to various types of collections containing various types
of objects. Suppose that coll is any object that implements the Collection
interface. Here are some of the operations that are defined:
- coll.size() -- returns an int that gives
the number of objects in the collection.
- coll.isEmpty() -- returns a boolean value which is true
if the size of the collection is 0.
- coll.clear() -- removes all objects from the collection.
- coll.contains(object) -- returns a boolean value that
is true if object is in the collection.
- coll.add(object) -- adds object to the collection.
The parameter can be any Object. Some collections can contain the value null,
while others cannot. This method returns a boolean value which tells you whether the operation
actually modified the collection. For example, adding an object to a Set has no effect
if that object was already in the set.
- coll.remove(object) -- removes object from
the collection, if it occurs in the collection,
and returns a boolean value that tells you whether the object was found.
- coll.containsAll(coll2) -- returns a boolean value that
is true if every object in coll2 is also in the coll. The parameter
can be any Collection.
- coll.addAll(coll2) -- adds all the objects in the collection
coll2 to coll.
- coll.removeAll(coll2) -- removes every object from
coll that also occurs in the collection coll2.
- coll.retainAll(coll2) -- removes every object from
coll that does not occur in the collection coll2. It "retains" only
the objects that do occur in coll2.
- coll.toArray() -- returns an array of type
Object[] that contains all the items in the collection. The return value can be
type-cast to another array type, if appropriate. For example, if you know that all the
items in coll are of type String, then (String[])coll.toArray()
gives you an array of Strings containing all the strings in the collection.
Since these methods are part of the Collection interface, they must be defined
for every object that implements that interface. There is a problem with this, however.
For example, the size of some kinds of Collection cannot be changed after
they are created. Methods that add or remove objects don't make sense for these collections.
While it is still legal to call the methods, an exception will be thrown when the call is
evaluated at run time. The type of exception is UnsupportedOperationException.
There is also the question of efficiency. Even when an operation is defined for
several types of collections, it might not be equally efficient in all cases. Even
a method as simple as size() can vary greatly in efficiency. For some collections,
computing the size() might involve counting the items in the collection.
The number of steps in this process is equal to the number of items.
Other collections might have instance variables to keep track of the size, so evaluating
size() just means returning the value of a variable. In this case, the computation
takes only one step, no matter how many items there are. When working with collections,
it's good to have some idea of how efficient operations are and to choose a collection
for which the operations you need can be implemented most efficiently. We'll see
specific examples of this in the next two sections.
The Collection interface defines a few basic generic algorithms, but
suppose you want to write your own generic algorithms. Suppose, for example, you
want to do something as simple as printing out every item in a collection. To do this in
a generic way, you need some way of going through an arbitrary collection, accessing
each item in turn. We have seen how to do this for specific data structures: For
an array, you can use a for loop to iterate through all the array indices. For
a linked list, you can use a while loop in which you advance a pointer along the list.
For a binary tree, you can use a recursive subroutine to do an infix traversal.
Collections can be represented in any of these forms and many others besides.
With such a variety of traversal mechanisms, how can we hope to come up with a single
generic method that will work for collections that are stored in wildly different
forms? This problem is solved by iterators.
An iterator is an object that can be used to traverse a collection. Different
types of collections have different types of iterators, but all iterators are
used in the same way. An algorithm that uses an iterator to traverse a collection
is generic, because the same technique can be applied to any type of collection. Iterators can seem
rather strange to someone who is encountering generic programming for the first time,
but you should understand that they solve a difficult problem in an elegant way.
The Collection interface defines a method that can be used to obtain
an iterator for any collection. If coll is a collection, then
coll.iterator() returns an iterator that can be used to traverse the
collection. You should think of the iterator as a kind of generalized pointer
that starts at the beginning of the collection and can move along the collection
from one item to the next. Iterators are defined by an interface named Iterator. This
interface defines just three methods. If iter refers to an Iterator,
then:
- iter.next() -- returns the next item, and
advances the iterator. The return value is of type Object. Note that there is no way
to look at an item without advancing the iterator past that item. If this method is called
when no items remain, it will throw a NoSuchElementException.
- iter.hasNext() -- returns a boolean value
telling you whether there are more items to be processed. You should test this before
calling iter.next().
- iter.remove() -- if you call this after
calling iter.next(), it will remove the item that you just saw from the collection.
This might produce an UnsupportedOperationException, if the collection does
not support removal of items.
Using iterators, we can write code for printing all the items in any collection.
Suppose that coll is of type Collection. Then we can say:
Iterator iter = coll.iterator();
while ( iter.hasNext() ) {
Object item = iter.next();
System.out.println(item);
}
The same general form will work for other types of processing. For example,
here is a subroutine that will remove all null values from any collection
(as long as that collection supports removal of values):
void removeNullValues( Collection coll ) {
Iterator iter = coll.iterator():
while ( iter.hasNext() ) {
Object item = iter.next();
if (item == null)
iter.remove();
}
}
Collections can hold objects of any type, so the return value of iter.next()
is Object. Now, there's not very much you can do with a general Object.
In practical situations, a collection is used to hold objects belonging to some more
specific class, and objects from the collection are type-cast to that class before they
are used. Suppose, for example, that we are working with Shapes,
where Shape is a class that represents geometric figures. Suppose that the
Shape class includes a draw() method for drawing the figure. Then
we can write a generic method for drawing all the figures in a collection of Shapes:
void drawAllShapes( Collection shapeCollection ) {
// Precondition: Every item in shapeCollection is non-null
// and belongs to the class Shape.
Iterator iter = shapeCollection.iterator();
while ( iter.hasNext() ) {
Shape figure = (Shape)iter.next();
figure.draw();
}
}
The precondition of this method points out that the method will fail
if the method contains an item that does not belong to class Shape.
When that item is encountered, the type-cast "(Shape)iter.next()"
will cause an exception of type ClassCastException. Although it's
unfortunate that we can't have a "Collection of Shapes" in Java, rather
than a "Collection of Objects", it's not a big problem in practice. You just
have to be aware of what type of objects you are storing in your collections.
Equality and Comparison
The discussion of methods in the Collection interface had an
unspoken assumption: It was assumed that it's known what it means for
two objects to be "equal." For example, the methods coll.contains(object)
and coll.remove(object) look for an item in the collection that is equal
to object. However, equality is not such a simple matter. The obvious
technique for testing equality -- using the == operator -- does not usually give
a reasonable answer when applied to objects. The == operator tests whether
two objects are identical in the sense that they share the same location in memory.
Usually, however, we want to consider two objects to be equal if they represent the same value,
which is a very different thing. Two values of type String should be
considered equal if they contain the same sequence of characters. The question of
whether those characters are stored in the same location in memory is irrelevant.
Two values of type Date should be considered equal if they represent
the same time.
The Object class defines a boolean-valued method equals(Object)
for testing whether one object is equal to another. For the purposes of collections,
obj1 and obj2 are considered to be equal if they are both null,
or if they are both non-null and obj1.equals(obj2) is true.
In the Object class, obj1.equals(obj2) is defined to be the same as
obj1 == obj2. However, for most sub-classes of Object, this
definition is not reasonable, and it should be overridden. The String class,
for example, overrides equals() so that for a String str, str.equals(obj)
if obj is also a String and obj contains the same sequence of characters
as str.
If you write your own class, you might want to define an equals() method in
that class to get the correct behavior when objects are tested for equality. For example,
a Card class that will work correctly when used in collections could be
defined as:
public class Card { // Class to represent playing cards.
int suit; // Number from 0 to 3 that codes for the suit --
// spades, diamonds, clubs or hearts.
int value; // Number from 1 to 13 that represents the value.
public boolean equals(Object obj) {
if (obj == null || ! (obj instanceof Card) ) {
// obj can't be equal to this Card if obj
// is not a Card, or if it is null.
return false;
}
else {
Card other = (Card)obj; // Type-cast obj to a Card.
if (suit == other.suit && value == other.value) {
// The other card has the same suit and value as
// this card, so they should be considered equal.
return true;
}
else
return false;
}
}
... // other methods and constructors
}
Without the equals() method in this class, methods such as
contains() and remove() from the Collection
interface will not work as expected for values of type Card.
A similar concern arises when items in a collection are sorted.
Sorting refers to arranging a sequence of items in ascending order,
according to some criterion. The problem is that there is
no natural notion of ascending order for arbitrary objects. Before
objects can be sorted, some method must be defined for comparing them.
Objects that are meant to be compared should implement the
interface java.lang.Comparable. This interface defines
one method:
public int compareTo(Object obj)
The value returned by obj1.compareTo(obj2) should be
zero if and only if the objects are equal (that is, if obj1.equals(obj2)
is true). It should be negative if and only if obj1 comes
before obj2, when the objects are arranged in ascending order.
And it should be positive if and only if obj1 comes after obj2.
In general, it should be considered an error to call obj1.compareTo(obj2)
if obj2 is not of the same type as obj1. The String
class implements the
Comparable interface and defines compareTo in a reasonable way.
If you define your own class and want to be able to sort objects belonging
to that class, you should do the same. For example:
class FullName implements Comparable {
// Represents a full name consisting of a first
// name and a last name.
String firstName, lastName;
public boolean equals(Object obj) {
if (obj == null || ! (obj instanceof FullName)) {
return false;
}
else {
FullName other = (FullName)obj;
return firstName.equals(other.firstName)
&& lastName.equals(other.lastName);
}
}
public void compareTo(Object obj) {
Fullname other = (FullName)obj;
// Will cause an error if obj is not a FullName.
if ( lastName.compareTo(other.lastName) < 0 ) {
// If lastName comes before the last name of
// the other object, then this FullName comes
// before the other FullName. Return a negative
// value to indicate this.
return -1;
}
if ( lastName.compareTo(other.lastName) > 0 ) {
// If lastName comes after the last name of
// the other object, then this FullName comes
// after the other FullName. Return a positive
// value to indicate this.
return 1;
}
else {
// Last names are the same, so base the comparison
// on the first names.
return firstName.compareTo(other.firstName);
}
}
... // other methods and constructors
}
There is another way to allow for comparison of objects in Java,
and that is to provide a separate object that is capable of making
the comparison. The object must implement the interface
java.util.Comparator, which defines the method:
public int compare(Object obj1, Object obj2)
This method compares two objects and returns a value that is
negative, or zero, or positive depending on whether obj1 comes
before obj2, or is the same as obj2, or comes
after obj2. Comparators are useful for comparing objects
that do not implement the Comparable interface and for
defining several different orderings on the same collection of
objects.
In the next two sections, we'll see how Comparable and Comparator are used
in the context of collections and maps.
Wrapper Classes
As noted above, Java's generic programming does not apply to the
primitive types. Before leaving this section, we should try to
address this problem.
You can't store an integer in a generic data structure designed to
hold Objects. On the other hand, there is nothing to stop you
from making an object that contains an integer and putting
that object into the data structure. In the simplest case, you could
define a class that does nothing but contain an integer:
public class IntContainer {
public int value;
}
In fact, Java already has a class similar to this one. An object
belonging to the class java.lang.Integer contains a single
int. It is called a wrapper for
that int. The int value is provided as a parameter
to the constructor. For example,
Integer val = new Integer(17);
creates an Integer object that "wraps" the number 17. The
Integer object can be used in generic data structures and in
other situations where an object is required. The int value
is stored in a private final instance variable of
the Integer object. If val refers to an object of
type Integer, you can find out what int it contains
by calling the instance method val.intValue(). There is no
way to change that value. We say that an Integer is an
immutable object. That is, after it has
been constructed, there is no way to change it. (Similarly, an
object of type String is immutable.)
There are wrapper classes for all the primitive types. All objects
belonging to these classes are immutable. The wrapper
class for values of type double is java.lang.Double.
The value stored in an object of type Double can be retrieved
by calling the instance method doubleValue().
The wrapper classes define a number of useful methods. Some of them
exist to support generic programming. For example, the wrapper classes
all define instance methods equals(Object) and compareTo(Object)
in a reasonable way. Other methods in the wrapper classes are utility functions for
working with the primitive types. For example, we encountered the
static methods Integer.parseInt(String) and Double.parseDouble(String)
in Section 7.4. These functions are used
to convert strings such as "42" or "2.71828" into the numbers they represent.