Thinking in C++ Vol 2 - Practical Programming |
Prev |
Home |
Next |
The word list examples use different approaches to extract
tokens from a stream, neither of which is very flexible. Since the STL
containers and algorithms all revolve around iterators, the most flexible
solution will itself use an iterator. You could think of the TokenIterator
as an iterator that wraps itself around any other iterator that can produce characters.
Because it is certainly a type of input iterator (the most primitive type of
iterator), it can provide input to any STL algorithm. Not only is it a useful
tool in itself, the following TokenIterator is also a good example of
how you can design your own iterators.
The TokenIterator class is doubly flexible. First,
you can choose the type of iterator that will produce the char input.
Second, instead of just saying what characters represent the delimiters, TokenIterator
will use a predicate that is a function object whose operator( )
takes a char and decides whether it should be in the token. Although the
two examples given here have a static concept of what characters belong in a
token, you could easily design your own function object to change its state as
the characters are read, producing a more sophisticated parser.
The following header file contains two basic predicates, Isalpha
and Delimiters, along with the template for TokenIterator:
//: C07:TokenIterator.h
#ifndef TOKENITERATOR_H
#define TOKENITERATOR_H
#include <algorithm>
#include <cctype>
#include <functional>
#include <iterator>
#include <string>
struct Isalpha : std::unary_function<char, bool>
{
bool operator()(char c) { return std::isalpha(c); }
};
class Delimiters : std::unary_function<char,
bool> {
std::string exclude;
public:
Delimiters() {}
Delimiters(const std::string& excl) :
exclude(excl) {}
bool operator()(char c) {
return exclude.find(c) == std::string::npos;
}
};
template<class InputIter, class Pred = Isalpha>
class TokenIterator : public std::iterator<
std::input_iterator_tag, std::string,
std::ptrdiff_t> {
InputIter first;
InputIter last;
std::string word;
Pred predicate;
public:
TokenIterator(InputIter begin, InputIter end,
Pred pred = Pred())
: first(begin), last(end), predicate(pred) {
++*this;
}
TokenIterator() {} // End sentinel
// Prefix increment:
TokenIterator& operator++() {
word.resize(0);
first = std::find_if(first, last, predicate);
while(first != last && predicate(*first))
word += *first++;
return *this;
}
// Postfix increment
class CaptureState {
std::string word;
public:
CaptureState(const std::string& w) : word(w) {}
std::string operator*() { return word; }
};
CaptureState operator++(int) {
CaptureState d(word);
++*this;
return d;
}
// Produce the actual value:
std::string operator*() const { return word; }
const std::string* operator->() const { return
&word; }
// Compare iterators:
bool operator==(const TokenIterator&) {
return word.size() == 0 && first == last;
}
bool operator!=(const TokenIterator& rv) {
return !(*this == rv);
}
};
#endif // TOKENITERATOR_H ///:~
The TokenIterator class derives from the std::iterator
template. It might appear that some kind of functionality comes with std::iterator,
but it is purely a way of tagging an iterator, to tell a container that uses it
what it can do. Here, you can see input_iterator_tag as the iterator_category
template argument this tells anyone who asks that a TokenIterator only
has the capabilities of an input iterator and cannot be used with algorithms
requiring more sophisticated iterators. Apart from the tagging, std::iterator
doesn t do anything beyond providing several useful type definitions. You must implement
all other functionality yourself.
The TokenIterator class may look a little strange at
first, because the first constructor requires both a begin and an end
iterator as arguments, along with the predicate. Remember, this is a wrapper
iterator that has no idea how to tell when it s at the end of its input, so the
ending iterator is necessary in the first constructor. The reason for the
second (default) constructor is that the STL algorithms (and any algorithms you
write) need a TokenIterator sentinel to be the past-the-end value. Since
all the information necessary to see if the TokenIterator has reached
the end of its input is collected in the first constructor, this second
constructor creates a TokenIterator that is merely used as a placeholder
in algorithms.
The core of the behavior happens in operator++. This
erases the current value of word using string::resize( ) and
then finds the first character that satisfies the predicate (thus discovering
the beginning of the new token) using find_if( ). The resulting
iterator is assigned to first, thus moving first forward to the
beginning of the token. Then, as long as the end of the input is not reached
and the predicate is satisfied, input characters are copied into word.
Finally, the TokenIterator object is returned and must be dereferenced
to access the new token.
The postfix increment requires an object of type CaptureState
to hold the value before the increment, so it can be returned. Producing the
actual value is a straightforward operator*. The only other functions to
define for an output iterator are the operator== and operator!=
to indicate whether the TokenIterator has reached the end of its input.
You can see that the argument for operator== is ignored it only cares
about whether it has reached its internal last iterator. Notice that operator!=
is defined in terms of operator==.
A good test of TokenIterator includes a number of
different sources of input characters, including a streambuf_iterator, a
char*, and a deque<char>::iterator. Finally, the original
word list problem is solved:
//: C07:TokenIteratorTest.cpp {-g++}
#include <fstream>
#include <iostream>
#include <vector>
#include <deque>
#include <set>
#include "TokenIterator.h"
#include "../require.h"
using namespace std;
int main(int argc, char* argv[]) {
char* fname = "TokenIteratorTest.cpp";
if(argc > 1) fname = argv[1];
ifstream in(fname);
assure(in, fname);
ostream_iterator<string> out(cout,
"\n");
typedef istreambuf_iterator<char> IsbIt;
IsbIt begin(in), isbEnd;
Delimiters delimiters("
\t\n~;()\"<>:{}[]+-=&*#.,/\\");
TokenIterator<IsbIt, Delimiters>
wordIter(begin, isbEnd, delimiters), end;
vector<string> wordlist;
copy(wordIter, end, back_inserter(wordlist));
// Output results:
copy(wordlist.begin(), wordlist.end(), out);
*out++ =
"-----------------------------------";
// Use a char array as the source:
char* cp = "typedef
std::istreambuf_iterator<char> It";
TokenIterator<char*, Delimiters>
charIter(cp, cp + strlen(cp), delimiters), end2;
vector<string> wordlist2;
copy(charIter, end2, back_inserter(wordlist2));
copy(wordlist2.begin(), wordlist2.end(), out);
*out++ =
"-----------------------------------";
// Use a deque<char> as the source:
ifstream in2("TokenIteratorTest.cpp");
deque<char> dc;
copy(IsbIt(in2), IsbIt(), back_inserter(dc));
TokenIterator<deque<char>::iterator,Delimiters>
dcIter(dc.begin(), dc.end(), delimiters), end3;
vector<string> wordlist3;
copy(dcIter, end3, back_inserter(wordlist3));
copy(wordlist3.begin(), wordlist3.end(), out);
*out++ =
"-----------------------------------";
// Reproduce the Wordlist.cpp example:
ifstream in3("TokenIteratorTest.cpp");
TokenIterator<IsbIt, Delimiters>
wordIter2(IsbIt(in3), isbEnd, delimiters);
set<string> wordlist4;
while(wordIter2 != end)
wordlist4.insert(*wordIter2++);
copy(wordlist4.begin(), wordlist4.end(), out);
} ///:~
When using an istreambuf_iterator, you create one to
attach to the istream object and one with the default constructor as the
past-the-end marker. Both are used to create the TokenIterator that will
produce the tokens; the default constructor produces the faux TokenIterator
past-the-end sentinel. (This is just a placeholder and is ignored.) The
TokenIterator produces strings that are inserted into a container of
string here a vector<string> is used in all cases except
the last. (You could also concatenate the results onto a string.) Other
than that, a TokenIterator works like any other input iterator.
When defining a bidirectional (and therefore also a random
access) iterator, you can get reverse iterators for free by using the std::reverse_iterator
adaptor. If you have already defined an iterator for a container with
bidirectional capabilities, you can get a reverse iterator from your forward-traversing
iterator with lines like the following inside your container class:
// Assume "iterator" is your nested iterator type
typedef std::reverse_iterator<iterator>
reverse_iterator;
reverse_iterator rbegin() {return
reverse_iterator(end());
reverse_iterator rend() {return
reverse_iterator(begin());
The std::reverse_iterator adaptor does all the work
for you. For example, if you use the * operator to dereference your
reverse iterator, it automatically decrements a temporary copy of the forward
iterator it is holding in order to return the correct element, since reverse
iterators logically point one position past the element they refer to.
Thinking in C++ Vol 2 - Practical Programming |
Prev |
Home |
Next |