Solution for
Programming Exercise 10.1
THIS PAGE DISCUSSES ONE POSSIBLE SOLUTION to
the following exercise from this on-line
Java textbook.
Exercise 10.1:
The WordList program from Section 10.3 reads a text file
and makes an alphabetical list of all the words in that file.
The list of words is output to another file. Improve the
program so that it also keeps track of the number of times that
each word occurs in the file. Write two lists to the output
file. The first list contains the words in alphabetical order.
The number of times that the word occurred in the file should
be listed along with the word. Then write a second list to the
output file in which the words are sorted according to the number of
times that they occurred in the files. The word that occurred
most often should be listed first.
Discussion
In the original WordList program, the words from the file are stored in
an array of Strings. For this exercise, we also need to keep track
of the number of times a given string has been encountered in the file.
One possibility would be to add an array of ints to store the
counts. This would be using "parallel arrays" to store the data.
Another approach is to create a small class to hold the word and its
associated counter. Then the data can be stored in an array of objects
belonging to that class. In my program, I used the latter approach.
The class that I defined to hold the data for a word is:
class WordData {
// A little class to hold the data for one word.
String word; // The word (in lower case letters).
int count; // The number of times the word has been found.
WordData(String w) {
// Constructor creates an object with the specified word
// and with the counter initialized to 1.
word = w;
count = 1;
}
}
Note that I've included a constructor to make it easy to create
objects of type WordData. This constructor will be used
whenever a new word is encountered in the file for the first time.
At that time, the
count should be one. The constructor takes care of
this detail. After that, each time we encounter the word again, the
count for that word is increased by 1. Aside from this,
the process of reading the file and storing the data in the array
is essentially the same as it was in the WordList program.
You can check the differences -- shown in red -- in the main()
routine and the insertWord() subroutine in the solution that
is given below. In insertWord, note in particular that
the word in position pos in the array is words[pos].word,
while the frequency counter for that word is words[pos].count.
For example, the command for adding 1 to the count is
"words[pos].count++;".
Since the list of words is output twice, I've written a subroutine,
putWordList(), that writes the words to an output stream. I've
tried to make the output reasonably neat by outputting the words in
a column that is 15 spaces wide. The other subroutine that I've
added is sortByFrequency(), which will rearrange the words
in the array so that they are in order of decreasing frequency.
This routine is called by the main program before the word list is
printed for the second time. The sorting routine uses a
standard Insertion Sort algorithm, as given in
Section 8.4, except that it sorts
into decreasing order instead of increasing.
The Solution
Significant changes from WordList.java are shown in
red.
/*
This program lets the user specify a text file for input and a file
for output. All the words are read from the input file. Words are
converted to lower case. The program makes a list of all the words
that occur in the file, along with the number of times that each
word occurs. The words and word frequencies are printed to an
output file. The list is printed twice: once in alphabetical order
and once in order of decreasing frequency. A word in this program is
defined to be any sequence of letters.
This class depends on the non-standard classes TextIO and TextReader.
*/
import java.io.*;
class WordData {
// A little class to hold the data for one word.
String word; // The word (in lower case letters).
int count; // The number of times the word has been found.
WordData(String w) {
// Constructor creates an object with the specified word
// and with the counter initialized to 1.
word = w;
count = 1;
}
}
public class WordFrequencies {
static WordData[] words; // An array to hold the words from the file.
// Note that the array will be expanded as
// necessary, in the insertWord() subroutine.
static int wordCount; // The number of words currently stored in
// the array.
public static void main(String[] args) {
TextReader in; // A stream for reading from the input file.
PrintWriter out; // A stream for writing to the output file.
String inputFileName; // Input file name, specified by the user.
String outputFileName; // Output file name, specified by the user.
words = new WordData[10]; // Start with space for 10 words.
wordCount = 0; // Currently, there are no words in the array.
/* Get the input file name from the user and try to create the
input stream. If there is a FileNotFoundException, print
a message and terminate the program. */
TextIO.put("Input file name? ");
inputFileName = TextIO.getln().trim();
try {
in = new TextReader(new FileReader(inputFileName));
}
catch (FileNotFoundException e) {
TextIO.putln("Can't find file \"" + inputFileName + "\".");
return;
}
/* Get the output file name from the user and try to create the
output stream. If there is an IOException, print a message
and terminate the program. */
TextIO.put("Output file name? ");
outputFileName = TextIO.getln().trim();
try {
out = new PrintWriter(new FileWriter(outputFileName));
}
catch (IOException e) {
TextIO.putln("Can't open file \"" + outputFileName + "\" for output.");
TextIO.putln(e.toString());
return;
}
/* Read all the words from the input stream and insert them into
the array of words. Reading from a TextReader can result in
an error of type TextReader.Error. If one occurs, print an
error message and terminate the program. */
try {
while (true) {
// Skip past and non-letters in the input stream. If an
// end-of-stream has been reached, end the loop. Otherwise,
// read a word and insert it into the array of words.
while ( ! in.eof() && ! Character.isLetter(in.peek()) )
in.getAnyChar();
if (in.eof())
break;
insertWord(in.getAlpha());
}
}
catch (TextReader.Error e) {
TextIO.putln("An error occurred while reading from the input file.");
TextIO.putln(e.toString());
return;
}
/* Output the words in alphabetical order. */
out.println("Words in alphabetical order, with the number of occurrences:");
out.println("-----------------------------------------------------------");
out.println();
putWordList(out);
/* Sort the list of words according the frequency with which
they were found in the file, and then output the list again. */
sortByFrequency();
out.println();
out.println();
out.println("Words in order of frequency:");
out.println("---------------------------");
out.println();
putWordList(out);
if (out.checkError()) {
TextIO.putln("Some error occurred while writing to the output file.");
TextIO.putln("The output might be missing, incomplete, or corrupted.");
}
else {
TextIO.putln("Done.");
}
} // end main()
static void insertWord(String w) {
// Insert the word w into the array of words, unless it already
// appears there. If the word already appears in the list,
// add 1 to the counter for that word. The words in the array are in
// lower case, and w is converted to lower case before it is processed.
// Note that the words in the array are kept in alphabetical order.
// If the array has no more space to hold w, then it is doubled
// in size.
int pos = 0; // This will be the position in the array where w belongs.
w = w.toLowerCase();
/* Find the position in the array where w belongs, after all the
words that precede w alphabetically. If a copy of w already
occupies that position, then it is not necessary to insert
w, so just add 1 to the counter associated with the word
and return. */
while (pos < wordCount && words[pos].word.compareTo(w) < 0)
pos++;
if (pos < wordCount && words[pos].word.equals(w)) {
words[pos].count++;
return;
}
/* If the array is full, make a new array that is twice as
big, copy all the words from the old array to the new,
and set the variable, words, to refer to the new array. */
if (wordCount == words.length) {
WordData[] newWords = new WordData[words.length*2];
System.arraycopy(words,0,newWords,0,wordCount);
words = newWords;
}
/* Put w into its correct position in the array. Move any
words that come after w up one space in the array to
make room for w. Create a new WordData object to hold
the new word. */
for (int i = wordCount; i > pos; i--)
words[i] = words[i-1];
words[pos] = new WordData(w);
wordCount++;
} // end insertWord()
static void putWordList(PrintWriter output) {
// Write the list of words from the words array, along with their
// associated frequencies, to the output stream specified in the
// parameter to this subroutine. Words are output in a column
// that is 15 spaces wide. If an individual word is longer than
// 15 spaces, the output won't be quite so neat.
for (int i = 0; i < wordCount; i++) {
output.print(" ");
output.print(words[i].word);
for (int space = words[i].word.length(); space < 15; space++)
output.print(" ");
output.print(" ");
output.println(words[i].count);
}
} // end putWordList()
static void sortByFrequency() {
// Use insertion sort to sort the words in the list so that they
// are in order of decreasing frequency. (Note that insertion sort
// has the following neat property: In a group of words that all
// have the same frequency, the words will remain in alphabetical
// order after the words have been sorted by frequency.)
for (int i = 1; i < wordCount; i++) {
WordData temp = words[i]; // Save the word in position i.
int location; // An index in the array.
location = i - 1;
while (location >= 0 && words[location].count < temp.count) {
// Bump words that have frequencies less than the frequency
// of temp up one space in the array.
words[location + 1] = words[location];
location--;
}
words[location + 1] = temp; // Put temp in last vacated space.
}
} // end sortByFrequency()
} // end class WordFrequencies
[ Exercises
| Chapter Index
| Main Index
]