Section 12.3
Map Classes
AN ARRAY OF N ELEMENTS can be thought
of as a way of associating some item with each of the integers
0, 1, ..., N-1. If i is
one of these integers, it's possible to get
the item associated with i, and it's possible to
put a new item in the i-th position.
These "get" and "put" operations define what it means to be an array.
A Map is a kind of generalized array.
Like an array, a map is defined by "get" and "put" operations.
But in a map, these operations are defined not for integers
0, 1, ..., N-1, but for arbitrary
Objects. In fact, some programming languages use the
term associative array instead of "map"
and use the same notation for associative arrays as for regular
arrays. In those languages, for example, you might see the notation
A["fred"] used to indicate the item associated to the string
"fred" in the associative array A. Java does not use array notation
for maps, but the idea is that same: A map is like an array, but
the indices for a map are arbitrary objects, not integers.
In a map, an object that serves as an "index" is called a
key. The object that is associated
with a key is called a value.
Note that a key can have at most one associated value, but the same
value can be associated to several different keys.
In Java, maps are defined by the interface java.util.Map,
which includes put and get methods as well as
other general methods for working with maps. If map is
a variable of type Map, then these are some of the methods
that are defined for map:
- map.get(key) -- returns the
Object that is associated by the map to the Object key.
If the map does not associate any value with obj, then the return
value is null. Note that it's also possible for the return
value to be null when the map explicitely associates the
value null with the key.
Referring to "map.get(key)" is similar to referring to "A[key]"
for an array A. (But note that there is nothing like an
IndexOutOfBoundsException for maps.)
- map.put(key,value) --
Associates the specified value with the specified key,
where key and value can be any objects. If the
map already associated some other value with the key, then the new
value replaces the old one.
This is similar to the command "A[key] = value"
for an array.
- map.putAll(map2) -- if
map2 is any other map, this copies all the associations
from map2 into map.
- map.remove(key) --
if map associates a value to the specified key,
that association is removed from the map.
- map.containsKey(key) -- returns
a boolean value that is true if the map associates some
value to the specified key.
- map.containsValue(value) -- returns
a boolean value that is true if the map associates the
specified value to some key.
- map.size() -- returns an int
that gives the number of associations in the map.
- map.isEmpty() -- returns a boolean
value that is true if the map is empty, that is if it contains
no associations.
- map.clear() -- removes all
associations from the map, leaving it empty.
The put and get methods are certainly the most commonly
used of the methods in the Map interface. In many applications, these
are the only methods that are needed, and in such cases a map is really no
more difficult to use than a standard array.
Java includes two classes that implement the Map interface:
TreeMap and HashMap. In a TreeMap, the key/value
associations are stored in a sorted tree, in which they are sorted according
to their keys. For this to work, it must be possible to compare
the keys to one another. This means either that the keys must implement
the Comparable interface, or that a Comparator must
be provided for comparing keys. (The Comparator can be provided
as a parameter to the TreeMap constructor.)
A HashMap does not store associations in any particular order,
so there are no restrictions on the keys that can be used in a HashMap.
Most operations are a little faster on HashMaps than they are
on TreeMaps. In general, you should use a HashMap unless
you have some particular need for the ordering property of a TreeMap.
In particular, if you are only using the put and get operations,
you can use a HashMap.
Let's look at an example. In Section 8.4,
I presented a simple PhoneDirectory class that associated phone numbers with
names. That class defined operations addEntry(name,number) and
getNumber(name), where both name and number are
given as Strings. In fact, the phone directory is acting just like
a map, with the addEntry method playing the role of the
put operation and getNumber playing the role of get.
In a real programming application, there would be no need to define a new
class; we could simply use a Map. Using a Map does have
the disadvantage that we are forced to work with Objects instead
of Strings. If that is a problem, it's easy to define a
phone directory class that uses a Map in its implementation:
import java.util.HashMap;
public class PhoneDirectory {
private HashMap info = new HashMap(); // Stores the data for
// the phone directory.
public void addEntry(String name, String number) {
// Record the phone number for a specified name.
info.put(name,number);
}
public String getNumber(String name) {
// Retrieve the phone number for a specified name.
// Returns null if there is no number for the name.
return (String)info.get(name);
}
} // end class PhoneDirectory
In the definition of the getNumber() method, the return
value of info.get(name) is type-cast to type String.
Since the return type of the get() method is Object,
a type-cast is typically necessary before the return value can be used.
By "wrapping" the Map in a PhoneDirectory class,
we hide this unsightly type-cast in the implementation of the class
and provide a more natural interface for the phone directory.
Views, SubSets, and SubMaps
A Map is not a Collection, and maps do not
implement all the operations defined on collections. In particular,
there are no iterators for maps. Sometimes, though, it's useful
to be able to iterate through all the associations in a map.
Java makes this possible is a roundabout but clever way.
If map is a variable of type Map, then the
method:
map.keySet()
returns the set of all objects that occur as keys for associations
in the map. That is, the return value is an object that implements
the Set interface. The elements of this set are the map's
keys. The obvious way to implement the keySet() method would
be to create a new set object, add all the keys from the map, and
return that set. But that's not how it's done. The value returned
by map.keySet() is not an independent object. It is what is
called a view of objects that are stored
in the map. This "view" of the map implements the Set interface,
but it does it in such a way that the methods defined in the interface
refer directly to keys in the map. For example, if you remove a
key from the view, that key -- along with its associated value -- is
actually removed from the map. It's not legal to add an object
to the view, since it doesn't make sense to add a key to a map
without specifying the value that should be associated to the key.
Since map.keySet() does not create a new set, it's very
efficient even for very large maps.
One of the things that you can do with a Set is get an
Iterator for it and use the iterator to visit each of
the elements of the set in turn. We can use an iterator for
the key set of a map to traverse the map. For example:
Set keys = map.keySet(); // The set of keys in the map.
Iterator keyIter = keys.iterator();
System.out.println("The map contains the following associations:");
while (keyIter.hasNext()) {
Object key = keyIter.next(); // Get the next key.
Object value = map.get(key); // Get the value for that key.
System.out.println( " (" + key + "," + value + ")" );
}
If the map is a TreeMap, then the key set of the map is a
sorted set, and the iterator will visit the keys in ascending order.
The Map interface defines two other views. If map
is a variable of type Map, then the method:
map.values()
returns a Collection that contains all the values from the
associations that are stored in the map. The return value is a
Collection rather than a Set because it can contain
duplicate elements (since a map can associate the same value to any
number of keys).
The method:
map.entrySet()
returns a Set that contains all the associations from the map.
The information in this class is actually no different from the information
in the map itself, but the set provides a different view of this information,
with different operations. Each element of the entry set is an object belonging
to the class Map.Entry. (This class is defined as a static nested class,
so its full name contains a period. However, it can be used in the same way as
any other class to declare variables and do type-casts.)
A Map.Entry object contains one key/value pair,
and defines methods getKey() and getValue() for retrieving
the key and the value. There is also a method setValue(value) for setting
the value. We could use the entry set of a map to print all the keys and values.
This is more efficient than using the key set to print the same information,
as I did in the above example, since we don't have to look up the value
associated with each key:
Set entries = map.entrySet();
Iterator entryIter = entries.iterator();
System.out.println("The map contains the following associations:");
while (entryIter.hasNext()) {
Map.Entry entry = (Map.Entry)entryIter.next();
Object key = entry.getKey(); // Get the key from the entry.
Object value = entry.getValue(); // Get the value.
System.out.println( " (" + key + "," + value + ")" );
}
Maps are not the only place in Java's generic programming
framework where views are used. For example, the List
interface defines a sub-list as
a view of a part of a list. The method:
List subList(int fromIndex, int toIndex)
returns a view of the part of the list consisting of the
list elements in positions between fromIndex
and toIndex (including fromIndex but excluding
toIndex). This view lets you operate on the sublist
using any of the operations defined for lists, but the sublist
is not an independent list. Changes made to the sublist are
actually being made to the original list.
Similarly, it is possible to obtain views that represent certain
subsets of a sorted set. If set is a TreeSet,
then set.subSet(fromElement,toElement) returns a Set
that contains all the elements of set that are between
fromElement and toElement (including fromElement
and excluding toElement). For example, if words is
a TreeSet in which all the elements are Strings of lower case letters,
then words.subSet("m","n") contains all the elements of words
that begin with the letter m. This subset is a view of part of the original
set. That is, creating the subset does not involve copying elements, and
changes made to the subset, such as adding or removing elements, are actually
made to the original set. The view set.headSet(toElement) consists
of all elements from the set which are less than toElement, and
set.tailSet(fromElement) is a view that contains all elements from
the set that are greater than or equal to fromElement.
The TreeMap class defines three submap views. A submap is
similar to a subset. A submap is a Map that contains a subset of the keys
from the original Map, along
with their associated values. If map is a variable of type
TreeMap, then map.subMap(fromKey,toKey) returns a view
that contains all key/value pairs from map whose keys are between fromKey
and toKey (including fromKey and excluding toKey).
There are also views map.headMap(toKey) and map.tailMap(fromKey)
which are defined in the obvious way. Suppose, for example, that blackBook
is a TreeMap in which the keys are names and the values are phone numbers.
We can print out all the entries from blackBook where the name begins
with "M" as follows:
Map ems = blackBook.subMap("M","N");
// This submap contains entries for which the key is greater
// than or equal to "M" and strictly less than "N".
if (ems.isEmpty())
System.out.println("No entries beginning with M.");
else {
Iterator iter = ems.entrySet().iterator();
// This iterator will traverse the entries in the submap, ems.
while (iter.hasNext()) {
// Get the next entry and print its key and value.
Map.Entry entry = iter.next();
System.out.println( entry.getKey() + ": " + entry.getValue() );
}
}
Subsets and submaps are probably best thought of as generalized search
operations that make it possible to find all the items in a range of
values, rather than just to find a single value. Suppose, for example
that a database of scheduled events is stored in a TreeMap
in which the keys are the times of the events, and suppose you want
a listing of all events that are scheduled for some time on July 4, 2002.
Just make a submap containing all keys in the range from
12:00 AM, July 4, 2002 to 12:00 AM, July 5, 2002, and output
all the entries from that submap. This type of search,
which is known as a subrange query
is quite common.
Hash Tables
HashSets and HashMaps are implemented using a data
structure known as a hash table. You don't
need to understand hash tables to use HashSets or HashMaps,
but any computer programmer should be familiar with hash tables and
how they work.
Hash tables are an elegant solution to the search problem. A hash
table, like a HashMap, stores key/value pairs. Given a key,
you have to search the table for the corresponding key/value pair.
When a hash table is used to implement a set, the values are all null,
and the only question is whether or not the key occurs in the set.
You still have to search for the key to check whether it is there
or not.
In most
search algorithms, in order to find the item you are interested in,
you have to look through a bunch of other items that don't interest
you. To find something in an unsorted list, you have to go though
the items one-by-one until you come to the one you are looking for.
In a binary sort tree, you have to start at the root and move down
the tree until you find the item you want. When you search for a
key/value pair in a hash table, you can go directly to the location that
contains the item you want. You don't have to look through
any other items. (This is not quite true, but it's close.)
The location of the key/value pair is computed from the key: You just look at
the key, and then you go directly to the location where it is
stored.
How can this work?
If the keys were integers in the range 0 to 99, we could store the
key/value pairs in an array, A, of 100 elements. The key/value
pair with key N would be stored in A[N]. The key
takes us directly to the location of the key/value pair. The problem
is that there are usually far too many different possible keys for us
to be able to use an array with one location for each possible key.
For example, if the key can be any value of type int,
then we would need an array with over four billion locations -- quite
a waste of space if we are only going to store, say, a few thousand
items! If the key can be a string of any length, then the number
of possible keys is infinite, and using an array with one location
for each possible key is simply impossible.
Nevertheless, hash tables store their data in an array, and the
array index where a key is stored is based on the key. The index is not equal
to the key, but it is computed from the key. The array index for a
key is called the hash code for that
key. A function that computes a hash code, given a key,
is called a hash function. To find
a key in a hash table, you just have to compute the hash code of
the key and go directly to the array location given by that hash code.
If the hash code is 17, look in array location number 17.
Now, since there are fewer array locations than there are possible keys,
it's possible that we might try to store two or more keys in the
same array location. This is called a collision.
A collision is not an error. We can't reject a key just because another
key happened to have the same hash code. A hash table must be able to handle
collisions in some reasonable way. In the type of hash table that
is used in Java, each array location actually holds a linked list of key/value
pairs (possibly an empty list). When two items have the same hash code, they are in the
same linked list. The structure of the hash table looks something
like this:
In this diagram, there is one item with hash code 0, no items with hash
code 1, two items with hash code 2, and so on. In a properly designed
hash table, most of the linked lists are of length zero or one, and
the average length of the lists is less than one. Although the hash code
of a key doesn't necessarily take you directly to that key, there are
probably no more than one or two other items that you have to look through
before finding the key you want. For this to work properly, the
number of items in the hash table should be somewhat less than the
number of locations in the array. In Java's implementation, whenever
the number of items exceeds 75% of the array size, the array is
replaced by a larger one and all the items in the old array are
inserted into the new one.
The Object class defines the method hashCode(), which
returns a value of type int. When an object, obj, is stored
in a hash table that has N locations, a hash code in the range
0 to N-1 is needed. This hash code can be computed
as Math.abs(obj.hashCode()) % N, the remainder when
the absolute value of obj.hashCode() is divided by N.
(The Math.abs is necessary because obj.hashCode() can be
a negative integer, and we need a non-negative number to use as an
array index.)
For hashing to work properly, two objects that are equal according to the
equals() method should have the same hash code. In the Object
class, both equals() and hashCode() are based on the
address of the memory location where the object is stored. However, as noted
in Section 1, many classes redefine the equals()
method. If a class redefines the equals() method, and if objects
of that class will be used as keys in hash tables, then the class should
also redefine the hashCode() method.
For example, in the String class, the equals method
is redefined so that two objects of type String are considered to
be equal if they contain the same sequence of characters.
The hashCode() method is also redefined in the String class,
so that the hash code of a string is computed from the characters in that
string rather than from its location in memory. For Java's standard classes,
you can expect equals() and hashCode() to be correctly
defined. However, you might need to define these methods in classes that
you write yourself.