Java Programming Introduction

Section 12.3
Map Classes

AN ARRAY OF N ELEMENTS can be thought of as a way of associating some item with each of the integers 0, 1, ..., N-1. If i is one of these integers, it's possible to get the item associated with i, and it's possible to put a new item in the i-th position. These "get" and "put" operations define what it means to be an array.

A Map is a kind of generalized array. Like an array, a map is defined by "get" and "put" operations. But in a map, these operations are defined not for integers 0, 1, ..., N-1, but for arbitrary Objects. In fact, some programming languages use the term associative array instead of "map" and use the same notation for associative arrays as for regular arrays. In those languages, for example, you might see the notation A["fred"] used to indicate the item associated to the string "fred" in the associative array A. Java does not use array notation for maps, but the idea is that same: A map is like an array, but the indices for a map are arbitrary objects, not integers. In a map, an object that serves as an "index" is called a key. The object that is associated with a key is called a value. Note that a key can have at most one associated value, but the same value can be associated to several different keys.

In Java, maps are defined by the interface java.util.Map, which includes put and get methods as well as other general methods for working with maps. If map is a variable of type Map, then these are some of the methods that are defined for map:

map.get(key) -- returns the Object that is associated by the map to the Object key. If the map does not associate any value with obj, then the return value is null. Note that it's also possible for the return value to be null when the map explicitely associates the value null with the key. Referring to "map.get(key)" is similar to referring to "A[key]" for an array A. (But note that there is nothing like an IndexOutOfBoundsException for maps.)
map.put(key,value) -- Associates the specified value with the specified key, where key and value can be any objects. If the map already associated some other value with the key, then the new value replaces the old one. This is similar to the command "A[key] = value" for an array.
map.putAll(map2) -- if map2 is any other map, this copies all the associations from map2 into map.
map.remove(key) -- if map associates a value to the specified key, that association is removed from the map.
map.containsKey(key) -- returns a boolean value that is true if the map associates some value to the specified key.
map.containsValue(value) -- returns a boolean value that is true if the map associates the specified value to some key.
map.size() -- returns an int that gives the number of associations in the map.
map.isEmpty() -- returns a boolean value that is true if the map is empty, that is if it contains no associations.
map.clear() -- removes all associations from the map, leaving it empty.

The put and get methods are certainly the most commonly used of the methods in the Map interface. In many applications, these are the only methods that are needed, and in such cases a map is really no more difficult to use than a standard array.

Java includes two classes that implement the Map interface: TreeMap and HashMap. In a TreeMap, the key/value associations are stored in a sorted tree, in which they are sorted according to their keys. For this to work, it must be possible to compare the keys to one another. This means either that the keys must implement the Comparable interface, or that a Comparator must be provided for comparing keys. (The Comparator can be provided as a parameter to the TreeMap constructor.)

A HashMap does not store associations in any particular order, so there are no restrictions on the keys that can be used in a HashMap. Most operations are a little faster on HashMaps than they are on TreeMaps. In general, you should use a HashMap unless you have some particular need for the ordering property of a TreeMap. In particular, if you are only using the put and get operations, you can use a HashMap.

Let's look at an example. In Section 8.4, I presented a simple PhoneDirectory class that associated phone numbers with names. That class defined operations addEntry(name,number) and getNumber(name), where both name and number are given as Strings. In fact, the phone directory is acting just like a map, with the addEntry method playing the role of the put operation and getNumber playing the role of get. In a real programming application, there would be no need to define a new class; we could simply use a Map. Using a Map does have the disadvantage that we are forced to work with Objects instead of Strings. If that is a problem, it's easy to define a phone directory class that uses a Map in its implementation:
            import java.util.HashMap;
            public class PhoneDirectory {
               private HashMap info = new HashMap(); // Stores the data for
                                                     // the phone directory.
               public void addEntry(String name, String number) {
                     // Record the phone number for a specified name.
                  info.put(name,number);
               }
               public String getNumber(String name) {
                     // Retrieve the phone number for a specified name.
                     // Returns null if there is no number for the name.
                  return (String)info.get(name);
               }
            } // end class PhoneDirectory
In the definition of the getNumber() method, the return value of info.get(name) is type-cast to type String. Since the return type of the get() method is Object, a type-cast is typically necessary before the return value can be used. By "wrapping" the Map in a PhoneDirectory class, we hide this unsightly type-cast in the implementation of the class and provide a more natural interface for the phone directory.

Views, SubSets, and SubMaps

A Map is not a Collection, and maps do not implement all the operations defined on collections. In particular, there are no iterators for maps. Sometimes, though, it's useful to be able to iterate through all the associations in a map. Java makes this possible is a roundabout but clever way.

If map is a variable of type Map, then the method:
            map.keySet()
returns the set of all objects that occur as keys for associations in the map. That is, the return value is an object that implements the Set interface. The elements of this set are the map's keys. The obvious way to implement the keySet() method would be to create a new set object, add all the keys from the map, and return that set. But that's not how it's done. The value returned by map.keySet() is not an independent object. It is what is called a view of objects that are stored in the map. This "view" of the map implements the Set interface, but it does it in such a way that the methods defined in the interface refer directly to keys in the map. For example, if you remove a key from the view, that key -- along with its associated value -- is actually removed from the map. It's not legal to add an object to the view, since it doesn't make sense to add a key to a map without specifying the value that should be associated to the key. Since map.keySet() does not create a new set, it's very efficient even for very large maps.

One of the things that you can do with a Set is get an Iterator for it and use the iterator to visit each of the elements of the set in turn. We can use an iterator for the key set of a map to traverse the map. For example:
      Set keys = map.keySet();         // The set of keys in the map.
      Iterator keyIter = keys.iterator();
      System.out.println("The map contains the following associations:");
      while (keyIter.hasNext()) {
         Object key = keyIter.next();  // Get the next key.
         Object value = map.get(key);  // Get the value for that key.
         System.out.println( "   (" + key + "," + value + ")" );
      }
If the map is a TreeMap, then the key set of the map is a sorted set, and the iterator will visit the keys in ascending order.

The Map interface defines two other views. If map is a variable of type Map, then the method:
            map.values()
returns a Collection that contains all the values from the associations that are stored in the map. The return value is a Collection rather than a Set because it can contain duplicate elements (since a map can associate the same value to any number of keys). The method:
            map.entrySet()
returns a Set that contains all the associations from the map. The information in this class is actually no different from the information in the map itself, but the set provides a different view of this information, with different operations. Each element of the entry set is an object belonging to the class Map.Entry. (This class is defined as a static nested class, so its full name contains a period. However, it can be used in the same way as any other class to declare variables and do type-casts.) A Map.Entry object contains one key/value pair, and defines methods getKey() and getValue() for retrieving the key and the value. There is also a method setValue(value) for setting the value. We could use the entry set of a map to print all the keys and values. This is more efficient than using the key set to print the same information, as I did in the above example, since we don't have to look up the value associated with each key:
      Set entries = map.entrySet();
      Iterator entryIter = entries.iterator();
      System.out.println("The map contains the following associations:");
      while (entryIter.hasNext()) {
         Map.Entry entry = (Map.Entry)entryIter.next();
         Object key = entry.getKey();  // Get the key from the entry.
         Object value = entry.getValue();  // Get the value.
         System.out.println( "   (" + key + "," + value + ")" );
      }
Maps are not the only place in Java's generic programming framework where views are used. For example, the List interface defines a sub-list as a view of a part of a list. The method:
            List subList(int fromIndex, int toIndex)
returns a view of the part of the list consisting of the list elements in positions between fromIndex and toIndex (including fromIndex but excluding toIndex). This view lets you operate on the sublist using any of the operations defined for lists, but the sublist is not an independent list. Changes made to the sublist are actually being made to the original list.

Similarly, it is possible to obtain views that represent certain subsets of a sorted set. If set is a TreeSet, then set.subSet(fromElement,toElement) returns a Set that contains all the elements of set that are between fromElement and toElement (including fromElement and excluding toElement). For example, if words is a TreeSet in which all the elements are Strings of lower case letters, then words.subSet("m","n") contains all the elements of words that begin with the letter m. This subset is a view of part of the original set. That is, creating the subset does not involve copying elements, and changes made to the subset, such as adding or removing elements, are actually made to the original set. The view set.headSet(toElement) consists of all elements from the set which are less than toElement, and set.tailSet(fromElement) is a view that contains all elements from the set that are greater than or equal to fromElement.

The TreeMap class defines three submap views. A submap is similar to a subset. A submap is a Map that contains a subset of the keys from the original Map, along with their associated values. If map is a variable of type TreeMap, then map.subMap(fromKey,toKey) returns a view that contains all key/value pairs from map whose keys are between fromKey and toKey (including fromKey and excluding toKey). There are also views map.headMap(toKey) and map.tailMap(fromKey) which are defined in the obvious way. Suppose, for example, that blackBook is a TreeMap in which the keys are names and the values are phone numbers. We can print out all the entries from blackBook where the name begins with "M" as follows:
      Map ems = blackBook.subMap("M","N");
           // This submap contains entries for which the key is greater
           // than or equal to "M" and strictly less than "N".
      if (ems.isEmpty())
         System.out.println("No entries beginning with M.");
      else {
         Iterator iter = ems.entrySet().iterator();
            // This iterator will traverse the entries in the submap, ems.
         while (iter.hasNext()) {
                // Get the next entry and print its key and value.
            Map.Entry entry = iter.next();
            System.out.println( entry.getKey() + ": " + entry.getValue() );
         }
      }
Subsets and submaps are probably best thought of as generalized search operations that make it possible to find all the items in a range of values, rather than just to find a single value. Suppose, for example that a database of scheduled events is stored in a TreeMap in which the keys are the times of the events, and suppose you want a listing of all events that are scheduled for some time on July 4, 2002. Just make a submap containing all keys in the range from 12:00 AM, July 4, 2002 to 12:00 AM, July 5, 2002, and output all the entries from that submap. This type of search, which is known as a subrange query is quite common.

Hash Tables

HashSets and HashMaps are implemented using a data structure known as a hash table. You don't need to understand hash tables to use HashSets or HashMaps, but any computer programmer should be familiar with hash tables and how they work.

Hash tables are an elegant solution to the search problem. A hash table, like a HashMap, stores key/value pairs. Given a key, you have to search the table for the corresponding key/value pair. When a hash table is used to implement a set, the values are all null, and the only question is whether or not the key occurs in the set. You still have to search for the key to check whether it is there or not.

In most search algorithms, in order to find the item you are interested in, you have to look through a bunch of other items that don't interest you. To find something in an unsorted list, you have to go though the items one-by-one until you come to the one you are looking for. In a binary sort tree, you have to start at the root and move down the tree until you find the item you want. When you search for a key/value pair in a hash table, you can go directly to the location that contains the item you want. You don't have to look through any other items. (This is not quite true, but it's close.) The location of the key/value pair is computed from the key: You just look at the key, and then you go directly to the location where it is stored.

How can this work? If the keys were integers in the range 0 to 99, we could store the key/value pairs in an array, A, of 100 elements. The key/value pair with key N would be stored in A[N]. The key takes us directly to the location of the key/value pair. The problem is that there are usually far too many different possible keys for us to be able to use an array with one location for each possible key. For example, if the key can be any value of type int, then we would need an array with over four billion locations -- quite a waste of space if we are only going to store, say, a few thousand items! If the key can be a string of any length, then the number of possible keys is infinite, and using an array with one location for each possible key is simply impossible.

Nevertheless, hash tables store their data in an array, and the array index where a key is stored is based on the key. The index is not equal to the key, but it is computed from the key. The array index for a key is called the hash code for that key. A function that computes a hash code, given a key, is called a hash function. To find a key in a hash table, you just have to compute the hash code of the key and go directly to the array location given by that hash code. If the hash code is 17, look in array location number 17.

Now, since there are fewer array locations than there are possible keys, it's possible that we might try to store two or more keys in the same array location. This is called a collision. A collision is not an error. We can't reject a key just because another key happened to have the same hash code. A hash table must be able to handle collisions in some reasonable way. In the type of hash table that is used in Java, each array location actually holds a linked list of key/value pairs (possibly an empty list). When two items have the same hash code, they are in the same linked list. The structure of the hash table looks something like this:

In this diagram, there is one item with hash code 0, no items with hash code 1, two items with hash code 2, and so on. In a properly designed hash table, most of the linked lists are of length zero or one, and the average length of the lists is less than one. Although the hash code of a key doesn't necessarily take you directly to that key, there are probably no more than one or two other items that you have to look through before finding the key you want. For this to work properly, the number of items in the hash table should be somewhat less than the number of locations in the array. In Java's implementation, whenever the number of items exceeds 75% of the array size, the array is replaced by a larger one and all the items in the old array are inserted into the new one.

The Object class defines the method hashCode(), which returns a value of type int. When an object, obj, is stored in a hash table that has N locations, a hash code in the range 0 to N-1 is needed. This hash code can be computed as Math.abs(obj.hashCode()) % N, the remainder when the absolute value of obj.hashCode() is divided by N. (The Math.abs is necessary because obj.hashCode() can be a negative integer, and we need a non-negative number to use as an array index.)

For hashing to work properly, two objects that are equal according to the equals() method should have the same hash code. In the Object class, both equals() and hashCode() are based on the address of the memory location where the object is stored. However, as noted in Section 1, many classes redefine the equals() method. If a class redefines the equals() method, and if objects of that class will be used as keys in hash tables, then the class should also redefine the hashCode() method. For example, in the String class, the equals method is redefined so that two objects of type String are considered to be equal if they contain the same sequence of characters. The hashCode() method is also redefined in the String class, so that the hash code of a string is computed from the characters in that string rather than from its location in memory. For Java's standard classes, you can expect equals() and hashCode() to be correctly defined. However, you might need to define these methods in classes that you write yourself.

[ Next Section | Previous Section | Chapter Index | Main Index ]

Section 12.3 Map Classes

Views, SubSets, and SubMaps

Hash Tables

Section 12.3
Map Classes