Thinking in C++ Vol 2 - Practical Programming |
Prev |
Home |
Next |
The set container accepts only one copy of each element.
It also sorts the elements. (Sorting isn t intrinsic to the conceptual
definition of a set, but the STL set stores its elements in a balanced
tree data structure to provide rapid lookups, thus producing sorted results
when you traverse it.) The first two examples in this chapter used sets.
Consider the problem of creating an index for a book. You
might like to start with all the words in the book, but you only want one
instance of each word, and you want them sorted. A set is perfect for
this and solves the problem effortlessly. However, there s also the problem of
punctuation and any other nonalpha characters, which must be stripped off to
generate proper words. One solution to this problem is to use the Standard C
library functions isalpha( ) and isspace( ) to extract
only the characters you want. You can replace all unwanted characters with
spaces so that you can easily extract valid words from each line you read:
//: C07:WordList.cpp
// Display a list of words used in a document.
#include <algorithm>
#include <cctype>
#include <cstring>
#include <fstream>
#include <iostream>
#include <iterator>
#include <set>
#include <sstream>
#include <string>
#include "../require.h"
using namespace std;
char replaceJunk(char c) {
// Only keep alphas, space (as a delimiter), and '
return (isalpha(c) || c == '\'') ? c : ' ';
}
int main(int argc, char* argv[]) {
char* fname = "WordList.cpp";
if(argc > 1) fname = argv[1];
ifstream in(fname);
assure(in, fname);
set<string> wordlist;
string line;
while(getline(in, line)) {
transform(line.begin(), line.end(), line.begin(),
replaceJunk);
istringstream is(line);
string word;
while(is >> word)
wordlist.insert(word);
}
// Output results:
copy(wordlist.begin(), wordlist.end(),
ostream_iterator<string>(cout,
"\n"));
} ///:~
The call to transform( ) replaces each character
to be ignored with a space. The set container not only ignores duplicate words,
but compares the words it keeps according to the function object less<string>
(the default second template argument for the set container), which in
turn uses string::operator<( ), so the words emerge in
alphabetical order.
You don t need to use a set just to get a sorted
sequence. You can use the sort( ) function (along with a multitude
of other functions in the STL) on different STL containers. However, it s
likely that set will be faster here. Using a set is particularly handy
when you just want to do lookup, since its find( ) member function has logarithmic complexity and so is much faster than the generic find( )
algorithm. As you recall, the generic find( ) algorithm needs to
traverse the whole range until it finds the search element (resulting in a
worst-case complexity of N, and an average complexity of N/2). However, if you
have a sequence container that is already sorted, use equal_range( )
for logarithmic complexity when finding elements.
The following version shows how to build the list of words
with an istreambuf_iterator that moves the characters from one place
(the input stream) to another (a string object), depending on whether
the Standard C library function isalpha( ) returns true:
//: C07:WordList2.cpp
// Illustrates istreambuf_iterator and insert
iterators.
#include <cstring>
#include <fstream>
#include <iostream>
#include <iterator>
#include <set>
#include <string>
#include "../require.h"
using namespace std;
int main(int argc, char* argv[]) {
char* fname = "WordList2.cpp";
if(argc > 1) fname = argv[1];
ifstream in(fname);
assure(in, fname);
istreambuf_iterator<char> p(in), end;
set<string> wordlist;
while(p != end) {
string word;
insert_iterator<string> ii(word, word.begin());
// Find the first alpha character:
while(p != end && !isalpha(*p))
++p;
// Copy until the first non-alpha character:
while(p != end && isalpha(*p))
*ii++ = *p++;
if(word.size() != 0)
wordlist.insert(word);
}
// Output results:
copy(wordlist.begin(), wordlist.end(),
ostream_iterator<string>(cout,
"\n"));
} ///:~
This example was suggested by Nathan Myers, who invented the
istreambuf_iterator and its relatives. This iterator extracts
information character by character from a stream. Although the istreambuf_iterator
template argument might imply that you could extract, for example, ints
instead of char, that s not the case. The argument must be of some
character type a regular char or a wide character.
After the file is open, an istreambuf_iterator called
p is attached to the istream so characters can be extracted from
it. The set<string> called wordlist will hold the resulting
words.
The while loop reads words until it finds the end of
the input stream. This is detected using the default constructor for istreambuf_iterator,
which produces the past-the-end iterator object end. Thus, if you want
to test to make sure you re not at the end of the stream, you simply say p
!= end.
The second type of iterator that s used here is the insert_iterator, which you saw previously. This inserts objects into a container. Here, the
container is the string called word, which, for the purposes of
insert_iterator, behaves like a container. The constructor for insert_iterator
requires the container and an iterator indicating where it should start
inserting the characters. You could also use a back_insert_iterator, which requires that the container have a push_back( ) (string does).
After the while loop sets everything up, it begins by
looking for the first alpha character, incrementing start until that
character is found. It then copies characters from one iterator to the other,
stopping when a nonalpha character is found. Each word, assuming it is
nonempty, is added to wordlist.
Thinking in C++ Vol 2 - Practical Programming |
Prev |
Home |
Next |