Java Programming Introduction

Section 12.1
Generic Programming

GENERIC PROGRAMMING refers to writing code that will work for many types of data. We encountered the term in Section 8.3, where we looked at dynamic arrays of integers. The source code presented there for working with dynamic arrays of integers works only for data of type int. But the source code for dynamic arrays of double, String, JButton, or any other type would be almost identical. It seems silly to write essentially the same code over and over. As we saw in Section 8.3, Java goes some distance towards solving this problem by providing the ArrayList class. An ArrayList is essentially a dynamic array of values of type Object. Since every class is a sub-class of Object, objects belonging to any class can be stored in an ArrayList. This is an example of generic programming: The source code for the ArrayList class was written once, and it works for objects of any type. (It does, not, however, work for data belonging to the primitive types, such as int and double.)

The ArrayList class is just one of several classes and interfaces that are used for generic programming in Java. We will spend this chapter looking at these classes and how they are used. All the classes discussed in this chapter are defined in the package java.util, and you will need an import statement at the beginning of your program to get access to them. (Before you start putting import java.util.* at the beginning of every program, you should know that some things in java.util have names that are the same as things in other packages. For example, both java.util.List and java.awt.List exist.)

It is no easy task to design a library for generic programming. Java's solution has many nice features but is certainly not the only possible approach. It is almost certainly not the best, but in the context of the overall design of Java, it might be close to optimal. To get some perspective on generic programming in general, it might be useful to look very briefly at generic programming in two other languages.

Generic Programming in Smalltalk

Smalltalk was one of the very first object-oriented programming languages. It is still used today. Although it has not achieved anything like the popularity of Java or C++, it is the source of many ideas used in these languages. In Smalltalk, essentially all programming is generic, because of two basic properties of the language.

First of all, variables in Smalltalk are typeless. A data value has a type, such as integer or string, but variables do not have types. Any variable can hold data of any type. Parameters are also typeless, so a subroutine can be applied to parameter values of any type. Similarly, a data structure can hold data values of any type. For example, once you've defined a binary tree data structure in SmallTalk, you can use it for binary trees of integers or strings or dates or data of any other type. There is simply no need to write new code for each data type.

Secondly, all data values are objects, and all operations on objects are defined by methods in a class. This is true even for types that are "primitive" in Java, such as integers. When the "+" operator is used to add two integers, the operation is performed by calling a method in the integer class. When you define a new class, you can define a "+" operator, and you will then be able to add objects belonging to that class by saying "a + b" just as if you were adding numbers. Now, suppose that you write a subroutine that uses the "+" operator to add up the items in a list. The subroutine can be applied to a list of integers, but it can also be applied, automatically, to any other data type for which "+" is defined. Similarly, a subroutine that uses the "<" operator to sort a list can be applied to lists containing any type of data for which "<" is defined. There is no need to write a different sorting subroutine for each type of data.

Put these two features together and you have a language where data structures and algorithms will work for any type of data for which they make sense, that is, for which the appropriate operations are defined. This is real generic programming. This might sound pretty good, and you might be asking yourself why all programming languages don't work this way. This type of freedom makes it easier to write programs, but unfortunately it makes it harder to write programs that are correct and robust. (See Chapter 9.) Once you have a data structure that can contain data of any type, it becomes hard to ensure that it only holds the type of data that you want it to hold. If you have a subroutine that can sort any type of data, it's hard to ensure that it will only be applied to data for which the "<" operator is defined. More particularly, there is no way for a compiler to ensure these things. The problem will show up at run time when an attempt is made to apply some operation to a data type for which it is not defined, and the program will crash.

Generic Programming in C++

Unlike Smalltalk, C++ is a very strongly typed language, even more so than Java. Every variable has a type, and can only hold data values of that type. This means that the type of generic programming used in Smalltalk is impossible. Furthermore, C++ does not have anything corresponding to Java's Object class. That is, there is no class that is a superclass of all other classes. This means that C++ can't use Java's style of generic programming either. Nevertheless, C++ has a powerful and flexible system of generic programming. It is made possible by a language feature known as templates. In C++, instead of writing a different sorting subroutine for each type of data, you can write a single subroutine template. The template is not a subroutine; it's more like a factory for making subroutines. We can look at an example, since the syntax of C++ is very similar to Java's:
      template<class ItemType>
      void sort( ItemType A[], int count ) {
            // Sort count items in the array, A, into increasing order.
            // The algorithm that is used here is selection sort.
         for (int i = count-1; i > 0; i--) {
            int position_of_max = 0;
            for (int j = 1; j <= count ; j++)
               if ( A[j] > A[position_of_max] )
                  position_of_max = j;
            ItemType temp = A[count];
            A[count] = A[position_of_max];
            A[position_of_max] = temp;
         }
      }
This piece of code defines a subroutine template. If you remove the first line, "template<class ItemType>", and substitute the word "int" for the word "ItemType" in the rest of the template, you get a subroutine for sorting arrays of ints. (Even though it says "class ItemType", you can actually substitute any type for ItemType, including the primitive types.) If you substitute "string" for "ItemType", you get a subroutine for sorting arrays of strings. This is pretty much what the compiler does with the template. If your program says "sort(list,10)" where list is an array of ints, the compiler uses the template to generate a subroutine for sorting arrays of ints. If you say "sort(cards,10)" where cards is an array of objects of type Card, then the compiler generates a subroutine for sorting arrays of Cards. At least, it tries to. The template uses the ">" operator to compare values. If this operator is defined for values of type Card, then the compiler will successfully use the template to generate a subroutine for sorting Cards. If ">" is not defined for Cards, then the compiler will fail -- but this will happen at compile time, not, as in Smalltalk, at run time where it would make the program crash.

C++ also has templates for making classes. If you write a template for binary trees, you can use it to generate classes for binary trees of ints, binary trees of strings, binary trees of dates, and so on -- all from one template. The most recent version of C++ comes with a large number of pre-written templates called the Standard Template Library or STL. The STL is quite complex. Many people would say that its much too complex. But it is also one of the most interesting features of C++.

Generic Programming in Java

Like C++, Java is a strongly typed language. However, generic programming in Java is closer in spirit to Smalltalk than it is to C++. As I've already noted, generic programming in Java is based on the fact that class Object is a superclass of every other class. To some extent, this makes Java similar to Smalltalk: A data structure designed to hold Objects can hold values belonging to any class. There is no need for templates or any other new language feature to support generic programming.

Of course, primitive type values, such as integers, are not objects in Java and therefor cannot be stored in generic data structures. In fact, there is no way to do generic programming with the primitive data types in Java. The Smalltalk approach doesn't work except for objects, and the C++ approach is not available. Furthermore, generic subroutines are more problematic in Java than they are in either Smalltalk or C++. In Smalltalk, a subroutine can be called with parameter values of any type, and it will work fine as long as all the operations used by the subroutine are supported by the actual parameters. In Java, parameters to a subroutine must be of a specified type, and the subroutine can only use operations that are defined for that type. A subroutine with a parameter of type Object can be applied to objects of any type, but the subroutine can only use operations that are defined in class Object, and there aren't many of those! For example, there is no comparison operation defined in the Object class, so it is not possible to write a completely generic sorting algorithm. We'll see below how Java addresses this problem.

Because of problems like these, some people (including myself) claim that Java does not really support true generic programming. Other people disagree. But whether it's true generic programming or not, that doesn't prevent it from being very useful.

Collections and Maps

Java's generic data structures can be divided into two categories: collections and maps. A collection is more or less what it sound like: a collection of objects. A map associates objects in one set with objects in another set in the way that a dictionary associates definitions with words or a phone book associates phone numbers with names. A map is similar to what I called an "association list" in Section 8.4.

There are two types of collections: lists and sets. A list is a collection in which the objects are arranged in a linear sequence. A list has a first item, a second item, and so on. For any item in the list, except the last, there is an item that directly follows it. A set is a collection in which no object can appear more than once.

Note that the terms "collection," "list," "set," and "map" tell you nothing about how the data is stored. A list could be represented as an array, as a linked list, or, for that matter, as a map that associates the elements of the list to the numbers 0, 1, 2, .... In fact, these terms are represented in Java not by classes but by interfaces. The interfaces Collection, List, Set, and Map specify the basic operations on data structures of these types, but do not specify how the data structures are to be represented or how the operations are to be implemented. That will be specified in the classes that implement the interfaces. Even when you use these classes, you might not know what the implementation is unless you go look at the source code. Java's generic data structures are abstract data types. They are defined by the operations that can be performed on them, not by the physical layout of the data in the computer.

We will look at list and set classes in Section 2 and map classes in Section 3. But before we do that, we'll look briefly at some of the general operations that are available for all collections.

Generic Algorithms and Iterators

The Collection interface includes methods for performing some basic operations on collections of objects. Since "collection" is a very general concept, operations that can be applied to all collections are also very general. They are generic operations in the sense that they can be applied to various types of collections containing various types of objects. Suppose that coll is any object that implements the Collection interface. Here are some of the operations that are defined:

coll.size() -- returns an int that gives the number of objects in the collection.
coll.isEmpty() -- returns a boolean value which is true if the size of the collection is 0.
coll.clear() -- removes all objects from the collection.
coll.contains(object) -- returns a boolean value that is true if object is in the collection.
coll.add(object) -- adds object to the collection. The parameter can be any Object. Some collections can contain the value null, while others cannot. This method returns a boolean value which tells you whether the operation actually modified the collection. For example, adding an object to a Set has no effect if that object was already in the set.
coll.remove(object) -- removes object from the collection, if it occurs in the collection, and returns a boolean value that tells you whether the object was found.
coll.containsAll(coll2) -- returns a boolean value that is true if every object in coll2 is also in the coll. The parameter can be any Collection.
coll.addAll(coll2) -- adds all the objects in the collection coll2 to coll.
coll.removeAll(coll2) -- removes every object from coll that also occurs in the collection coll2.
coll.retainAll(coll2) -- removes every object from coll that does not occur in the collection coll2. It "retains" only the objects that do occur in coll2.
coll.toArray() -- returns an array of type Object[] that contains all the items in the collection. The return value can be type-cast to another array type, if appropriate. For example, if you know that all the items in coll are of type String, then (String[])coll.toArray() gives you an array of Strings containing all the strings in the collection.

Since these methods are part of the Collection interface, they must be defined for every object that implements that interface. There is a problem with this, however. For example, the size of some kinds of Collection cannot be changed after they are created. Methods that add or remove objects don't make sense for these collections. While it is still legal to call the methods, an exception will be thrown when the call is evaluated at run time. The type of exception is UnsupportedOperationException.

There is also the question of efficiency. Even when an operation is defined for several types of collections, it might not be equally efficient in all cases. Even a method as simple as size() can vary greatly in efficiency. For some collections, computing the size() might involve counting the items in the collection. The number of steps in this process is equal to the number of items. Other collections might have instance variables to keep track of the size, so evaluating size() just means returning the value of a variable. In this case, the computation takes only one step, no matter how many items there are. When working with collections, it's good to have some idea of how efficient operations are and to choose a collection for which the operations you need can be implemented most efficiently. We'll see specific examples of this in the next two sections.

The Collection interface defines a few basic generic algorithms, but suppose you want to write your own generic algorithms. Suppose, for example, you want to do something as simple as printing out every item in a collection. To do this in a generic way, you need some way of going through an arbitrary collection, accessing each item in turn. We have seen how to do this for specific data structures: For an array, you can use a for loop to iterate through all the array indices. For a linked list, you can use a while loop in which you advance a pointer along the list. For a binary tree, you can use a recursive subroutine to do an infix traversal. Collections can be represented in any of these forms and many others besides. With such a variety of traversal mechanisms, how can we hope to come up with a single generic method that will work for collections that are stored in wildly different forms? This problem is solved by iterators. An iterator is an object that can be used to traverse a collection. Different types of collections have different types of iterators, but all iterators are used in the same way. An algorithm that uses an iterator to traverse a collection is generic, because the same technique can be applied to any type of collection. Iterators can seem rather strange to someone who is encountering generic programming for the first time, but you should understand that they solve a difficult problem in an elegant way.

The Collection interface defines a method that can be used to obtain an iterator for any collection. If coll is a collection, then coll.iterator() returns an iterator that can be used to traverse the collection. You should think of the iterator as a kind of generalized pointer that starts at the beginning of the collection and can move along the collection from one item to the next. Iterators are defined by an interface named Iterator. This interface defines just three methods. If iter refers to an Iterator, then:

iter.next() -- returns the next item, and advances the iterator. The return value is of type Object. Note that there is no way to look at an item without advancing the iterator past that item. If this method is called when no items remain, it will throw a NoSuchElementException.
iter.hasNext() -- returns a boolean value telling you whether there are more items to be processed. You should test this before calling iter.next().
iter.remove() -- if you call this after calling iter.next(), it will remove the item that you just saw from the collection. This might produce an UnsupportedOperationException, if the collection does not support removal of items.

Using iterators, we can write code for printing all the items in any collection. Suppose that coll is of type Collection. Then we can say:
            Iterator iter = coll.iterator();
            while ( iter.hasNext() ) {
               Object item = iter.next();
               System.out.println(item);
            }
The same general form will work for other types of processing. For example, here is a subroutine that will remove all null values from any collection (as long as that collection supports removal of values):
            void removeNullValues( Collection coll ) {
               Iterator iter = coll.iterator():
               while ( iter.hasNext() ) {
                   Object item = iter.next();
                   if (item == null)
                      iter.remove();
               }
            }
Collections can hold objects of any type, so the return value of iter.next() is Object. Now, there's not very much you can do with a general Object. In practical situations, a collection is used to hold objects belonging to some more specific class, and objects from the collection are type-cast to that class before they are used. Suppose, for example, that we are working with Shapes, where Shape is a class that represents geometric figures. Suppose that the Shape class includes a draw() method for drawing the figure. Then we can write a generic method for drawing all the figures in a collection of Shapes:
            void drawAllShapes( Collection shapeCollection ) {
                  // Precondition: Every item in shapeCollection is non-null
                  //               and belongs to the class Shape.
               Iterator iter = shapeCollection.iterator();
               while ( iter.hasNext() ) {
                  Shape figure = (Shape)iter.next();
                  figure.draw();
               }
            }
The precondition of this method points out that the method will fail if the method contains an item that does not belong to class Shape. When that item is encountered, the type-cast "(Shape)iter.next()" will cause an exception of type ClassCastException. Although it's unfortunate that we can't have a "Collection of Shapes" in Java, rather than a "Collection of Objects", it's not a big problem in practice. You just have to be aware of what type of objects you are storing in your collections.

Equality and Comparison

The discussion of methods in the Collection interface had an unspoken assumption: It was assumed that it's known what it means for two objects to be "equal." For example, the methods coll.contains(object) and coll.remove(object) look for an item in the collection that is equal to object. However, equality is not such a simple matter. The obvious technique for testing equality -- using the == operator -- does not usually give a reasonable answer when applied to objects. The == operator tests whether two objects are identical in the sense that they share the same location in memory. Usually, however, we want to consider two objects to be equal if they represent the same value, which is a very different thing. Two values of type String should be considered equal if they contain the same sequence of characters. The question of whether those characters are stored in the same location in memory is irrelevant. Two values of type Date should be considered equal if they represent the same time.

The Object class defines a boolean-valued method equals(Object) for testing whether one object is equal to another. For the purposes of collections, obj1 and obj2 are considered to be equal if they are both null, or if they are both non-null and obj1.equals(obj2) is true. In the Object class, obj1.equals(obj2) is defined to be the same as obj1 == obj2. However, for most sub-classes of Object, this definition is not reasonable, and it should be overridden. The String class, for example, overrides equals() so that for a String str, str.equals(obj) if obj is also a String and obj contains the same sequence of characters as str.

If you write your own class, you might want to define an equals() method in that class to get the correct behavior when objects are tested for equality. For example, a Card class that will work correctly when used in collections could be defined as:
        public class Card {  // Class to represent playing cards.
           int suit;  // Number from 0 to 3 that codes for the suit --
                      // spades, diamonds, clubs or hearts.
           int value; // Number from 1 to 13 that represents the value.
           public boolean equals(Object obj) {
               if (obj == null || ! (obj instanceof Card) ) {
                     // obj can't be equal to this Card if obj
                     // is not a Card, or if it is null.
                  return false;
               }
               else {
                  Card other = (Card)obj;  // Type-cast obj to a Card.
                  if (suit == other.suit && value == other.value) {
                        // The other card has the same suit and value as
                        // this card, so they should be considered equal.
                     return true;
                  }
                  else
                     return false;
               }
           }
            ... // other methods and constructors
        }
Without the equals() method in this class, methods such as contains() and remove() from the Collection interface will not work as expected for values of type Card.

A similar concern arises when items in a collection are sorted. Sorting refers to arranging a sequence of items in ascending order, according to some criterion. The problem is that there is no natural notion of ascending order for arbitrary objects. Before objects can be sorted, some method must be defined for comparing them. Objects that are meant to be compared should implement the interface java.lang.Comparable. This interface defines one method:
            public int compareTo(Object obj)
The value returned by obj1.compareTo(obj2) should be zero if and only if the objects are equal (that is, if obj1.equals(obj2) is true). It should be negative if and only if obj1 comes before obj2, when the objects are arranged in ascending order. And it should be positive if and only if obj1 comes after obj2. In general, it should be considered an error to call obj1.compareTo(obj2) if obj2 is not of the same type as obj1. The String class implements the Comparable interface and defines compareTo in a reasonable way. If you define your own class and want to be able to sort objects belonging to that class, you should do the same. For example:
      class FullName implements Comparable {
              // Represents a full name consisting of a first
              // name and a last name.
         String firstName, lastName;
         public boolean equals(Object obj) {
            if (obj == null || ! (obj instanceof FullName)) {
               return false;
            }
            else {
               FullName other = (FullName)obj;
               return firstName.equals(other.firstName)
                           && lastName.equals(other.lastName);
            }
         }
         public void compareTo(Object obj) {
            Fullname other = (FullName)obj;
               // Will cause an error if obj is not a FullName.
            if ( lastName.compareTo(other.lastName) < 0 ) {
                   // If lastName comes before the last name of
                   // the other object, then this FullName comes
                   // before the other FullName.  Return a negative
                   // value to indicate this.
               return -1;
            }
            if ( lastName.compareTo(other.lastName) > 0 ) {
                   // If lastName comes after the last name of
                   // the other object, then this FullName comes
                   // after the other FullName.  Return a positive
                   // value to indicate this.
               return 1;
            }
            else {
                   // Last names are the same, so base the comparison
                   // on the first names.
               return firstName.compareTo(other.firstName);
            }
         }
          ... // other methods and constructors
      }
There is another way to allow for comparison of objects in Java, and that is to provide a separate object that is capable of making the comparison. The object must implement the interface java.util.Comparator, which defines the method:
            public int compare(Object obj1, Object obj2)
This method compares two objects and returns a value that is negative, or zero, or positive depending on whether obj1 comes before obj2, or is the same as obj2, or comes after obj2. Comparators are useful for comparing objects that do not implement the Comparable interface and for defining several different orderings on the same collection of objects.

In the next two sections, we'll see how Comparable and Comparator are used in the context of collections and maps.

Wrapper Classes

As noted above, Java's generic programming does not apply to the primitive types. Before leaving this section, we should try to address this problem.

You can't store an integer in a generic data structure designed to hold Objects. On the other hand, there is nothing to stop you from making an object that contains an integer and putting that object into the data structure. In the simplest case, you could define a class that does nothing but contain an integer:
        public class IntContainer {
           public int value;
        }
In fact, Java already has a class similar to this one. An object belonging to the class java.lang.Integer contains a single int. It is called a wrapper for that int. The int value is provided as a parameter to the constructor. For example,
            Integer val = new Integer(17);
creates an Integer object that "wraps" the number 17. The Integer object can be used in generic data structures and in other situations where an object is required. The int value is stored in a private final instance variable of the Integer object. If val refers to an object of type Integer, you can find out what int it contains by calling the instance method val.intValue(). There is no way to change that value. We say that an Integer is an immutable object. That is, after it has been constructed, there is no way to change it. (Similarly, an object of type String is immutable.)

There are wrapper classes for all the primitive types. All objects belonging to these classes are immutable. The wrapper class for values of type double is java.lang.Double. The value stored in an object of type Double can be retrieved by calling the instance method doubleValue().

The wrapper classes define a number of useful methods. Some of them exist to support generic programming. For example, the wrapper classes all define instance methods equals(Object) and compareTo(Object) in a reasonable way. Other methods in the wrapper classes are utility functions for working with the primitive types. For example, we encountered the static methods Integer.parseInt(String) and Double.parseDouble(String) in Section 7.4. These functions are used to convert strings such as "42" or "2.71828" into the numbers they represent.

[ Next Section | Chapter Index | Main Index ]

Section 12.1 Generic Programming

Generic Programming in Smalltalk

Generic Programming in C++

Generic Programming in Java

Collections and Maps

Generic Algorithms and Iterators

Equality and Comparison

Wrapper Classes

Section 12.1
Generic Programming