|
Choosing between Sets
You can choose a TreeSet, a HashSet, or a LinkedHashSet depending on the behavior you desire. The following test program gives an indication of the performance trade-off between the implementations:
//: c11:SetPerformance.java
// {Args: 500}
import java.util.*;
import com.bruceeckel.util.*;
public class SetPerformance {
private static int reps = 50000;
private abstract static class Tester {
String name;
Tester(String name) { this.name = name; }
abstract void test(Set s, int size);
}
private static Tester[] tests = {
new Tester("add") {
void test(Set s, int size) {
for(int i = 0; i < reps; i++) {
s.clear();
Collections2.fill(s,
Collections2.countries.reset(),size);
}
}
},
new Tester("contains") {
void test(Set s, int size) {
for(int i = 0; i < reps; i++)
for(int j = 0; j < size; j++)
s.contains(Integer.toString(j));
}
},
new Tester("iteration") {
void test(Set s, int size) {
for(int i = 0; i < reps * 10; i++) {
Iterator it = s.iterator();
while(it.hasNext())
it.next();
}
}
},
};
public static void test(Set s, int size) {
// Strip qualifiers from class name:
System.out.println("Testing " +
s.getClass().getName().replaceAll("\\w+\\.", "") +
" size " + size);
Collections2.fill(s,
Collections2.countries.reset(), size);
for(int i = 0; i < tests.length; i++) {
System.out.print(tests[i].name);
long t1 = System.currentTimeMillis();
tests[i].test(s, size);
long t2 = System.currentTimeMillis();
System.out.println(": " +
((double)(t2 - t1)/(double)size));
}
}
public static void main(String[] args) {
// Choose a different number of
// repetitions via the command line:
if(args.length > 0)
reps = Integer.parseInt(args[0]);
System.out.println(reps + " repetitions");
// Small:
test(new TreeSet(), 10);
test(new HashSet(), 10);
test(new LinkedHashSet(), 10);
// Medium:
test(new TreeSet(), 100);
test(new HashSet(), 100);
test(new LinkedHashSet(), 100);
// Large:
test(new TreeSet(), 1000);
test(new HashSet(), 1000);
test(new LinkedHashSet(), 1000);
}
} ///:~
The following table shows the results of one run. (Of course, this will be different according to the computer and JVM you are using; you should run the test yourself as well):
Type
|
Test size
|
Add
|
Contains
|
Iteration
|
|
10
|
25.0
|
23.4
|
39.1
|
TreeSet
|
100
|
17.2
|
27.5
|
45.9
|
|
1000
|
26.0
|
30.2
|
9.0
|
|
10
|
18.7
|
17.2
|
64.1
|
HashSet
|
100
|
17.2
|
19.1
|
65.2
|
|
1000
|
8.8
|
16.6
|
12.8
|
|
10
|
20.3
|
18.7
|
64.1
|
LinkedHashSet
|
100
|
18.6
|
19.5
|
49.2
|
|
1000
|
10.0
|
16.3
|
10.0
|
The performance of HashSet is generally superior to TreeSet for all operations (but in particular for addition and lookup, the two most important operations). The only reason TreeSet exists is because it maintains its elements in sorted order, so you use it only when you need a sorted Set.
Note that LinkedHashSet is slightly more expensive for insertions than HashSet; this is because of the extra cost of maintaining the linked list along with the hashed container. However, traversal is cheaper with LinkedHashSet because of the linked list.
|
|