Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Thinking in Java
Prev Contents / Index Next

Overriding hashCode( )

Now that you understand what’s involved in the function of the HashMap, the issues involved in writing a hashCode( ) will make more sense.

First of all, you don’t have control of the creation of the actual value that’s used to index into the array of buckets. That is dependent on the capacity of the particular HashMap object, and that capacity changes depending on how full the container is, and what the load factor is. The value produced by your hashCode( ) will be further processed in order to create the bucket index (in SimpleHashMap, the calculation is just a modulo by the size of the bucket array).

The most important factor in creating a hashCode( ) is that, regardless of when hashCode( ) is called, it produces the same value for a particular object every time it is called. If you end up with an object that produces one hashCode( ) value when it is put( ) into a HashMap and another during a get( ), you won’t be able to retrieve the objects. So if your hashCode( ) depends on mutable data in the object, the user must be made aware that changing the data will effectively produce a different key by generating a different hashCode( ).

In addition, you will probably not want to generate a hashCode( ) that is based on unique object information—in particular, the value of this makes a bad hashCode( ) because then you can’t generate a new identical key to the one used to put( ) the original key-value pair. This was the problem that occurred in SpringDetector.java, because the default implementation of hashCode( ) does use the object address. So you’ll want to use information in the object that identifies the object in a meaningful way.

One example can be seen in the String class. Strings have the special characteristic that if a program has several String objects that contain identical character sequences, then those String objects all map to the same memory (the mechanism for this is described in Appendix A). So it makes sense that the hashCode( ) produced by two separate instances of new String(“hello”) should be identical. You can see this in the following program:

//: c11:StringHashCode.java
import com.bruceeckel.simpletest.*;

public class StringHashCode {
  private static Test monitor = new Test();
  public static void main(String[] args) {
    System.out.println("Hello".hashCode());
    System.out.println("Hello".hashCode());
    monitor.expect(new String[] {
      "69609650",
      "69609650"
    });
  }
} ///:~


The hashCode( ) for String is clearly based on the contents of the String.

So for a hashCode( ) to be effective, it must be fast and it must be meaningful; that is, it must generate a value based on the contents of the object. Remember that this value doesn’t have to be unique—you should lean toward speed rather than uniqueness—but between hashCode( ) and equals( ), the identity of the object must be completely resolved.

Because the hashCode( ) is further processed before the bucket index is produced, the range of values is not important; it just needs to generate an int.

There’s one other factor: A good hashCode( ) should result in an even distribution of values. If the values tend to cluster, then the HashMap or HashSet will be more heavily loaded in some areas and will not be as fast as it could be with an evenly distributed hashing function.

In Effective Java (Addison-Wesley 2001), Joshua Bloch gives a basic recipe for generating a decent hashCode( ):

  1. Store some constant nonzero value, say 17, in an int variable called result.
  2. For each significant field f in your object (each field taken into account by the equals( ), that is), calculate an int hash code c for the field:

Field type

Calculation

boolean

c = (f ? 0 : 1)

byte, char, short, or int

c = (int)f

long

c = (int)(f ^ (f >>>32))

float

c = Float.floatToIntBits(f);

double

long l = Double.doubleToLongBits(f);
c = (int)(l ^ (l >>> 32))

Object, where equals( ) calls equals( ) for this field

c = f.hashCode( )

Array

Apply above rules to each element

  1. Combine the hash code(s) computed above:
    result = 37 * result + c;
  2. Return result.
  3. Look at the resulting hashCode( ) and make sure that equal instances have equal hash codes.

Here’s an example that follows these guidelines:

//: c11:CountedString.java
// Creating a good hashCode().
import com.bruceeckel.simpletest.*;
import java.util.*;

public class CountedString {
  private static Test monitor = new Test();
  private static List created = new ArrayList();
  private String s;
  private int id = 0;
  public CountedString(String str) {
    s = str;
    created.add(s);
    Iterator it = created.iterator();
    // Id is the total number of instances
    // of this string in use by CountedString:
    while(it.hasNext())
      if(it.next().equals(s))
        id++;
  }
  public String toString() {
    return "String: " + s + " id: " + id +
      " hashCode(): " + hashCode();
  }
  public int hashCode() {
    // Very simple approach:
    // return s.hashCode() * id;
    // Using Joshua Bloch's recipe:
    int result = 17;
    result = 37*result + s.hashCode();
    result = 37*result + id;
    return result;
  }
  public boolean equals(Object o) {
    return (o instanceof CountedString)
      && s.equals(((CountedString)o).s)
      && id == ((CountedString)o).id;
  }
  public static void main(String[] args) {
    Map map = new HashMap();
    CountedString[] cs = new CountedString[10];
    for(int i = 0; i < cs.length; i++) {
      cs[i] = new CountedString("hi");
      map.put(cs[i], new Integer(i));
    }
    System.out.println(map);
    for(int i = 0; i < cs.length; i++) {
      System.out.println("Looking up " + cs[i]);
      System.out.println(map.get(cs[i]));
    }
    monitor.expect(new String[] {
      "{String: hi id: 4 hashCode(): 146450=3," +
      " String: hi id: 10 hashCode(): 146456=9," +
      " String: hi id: 6 hashCode(): 146452=5," +
      " String: hi id: 1 hashCode(): 146447=0," +
      " String: hi id: 9 hashCode(): 146455=8," +
      " String: hi id: 8 hashCode(): 146454=7," +
      " String: hi id: 3 hashCode(): 146449=2," +
      " String: hi id: 5 hashCode(): 146451=4," +
      " String: hi id: 7 hashCode(): 146453=6," +
      " String: hi id: 2 hashCode(): 146448=1}",
      "Looking up String: hi id: 1 hashCode(): 146447",
      "0",
      "Looking up String: hi id: 2 hashCode(): 146448",
      "1",
      "Looking up String: hi id: 3 hashCode(): 146449",
      "2",
      "Looking up String: hi id: 4 hashCode(): 146450",
      "3",
      "Looking up String: hi id: 5 hashCode(): 146451",
      "4",
      "Looking up String: hi id: 6 hashCode(): 146452",
      "5",
      "Looking up String: hi id: 7 hashCode(): 146453",
      "6",
      "Looking up String: hi id: 8 hashCode(): 146454",
      "7",
      "Looking up String: hi id: 9 hashCode(): 146455",
      "8",
      "Looking up String: hi id: 10 hashCode(): 146456",
      "9"
    });
  }
} ///:~


CountedString includes a String and an id that represents the number of CountedString objects that contain an identical String. The counting is accomplished in the constructor by iterating through the static ArrayList where all the Strings are stored.

Both hashCode( ) and equals( ) produce results based on both fields; if they were just based on the String alone or the id alone, there would be duplicate matches for distinct values.

In main( ), a bunch of CountedString objects are created, using the same String to show that the duplicates create unique values because of the count id. The HashMap is displayed so that you can see how it is stored internally (no discernible orders), and then each key is looked up individually to demonstrate that the lookup mechanism is working properly.

Writing a proper hashCode( ) and equals( ) for a new class can be tricky. You can find tools to help you do this in Apache’s “Jakarta Commons” project at jakarta.apache.org/commons, under “lang” (this project also has many other potentially useful libraries, and appears to be the Java community’s answer to the C++ community’s www.boost.org).

Thinking in Java
Prev Contents / Index Next

 
 
   Reproduced courtesy of Bruce Eckel, MindView, Inc. Design by Interspire