Thinking in C++ Vol 2 - Practical Programming |
Prev |
Home |
Next |
The program Find.cpp earlier in this chapter leads us
to ask the obvious question: Why isn t case-insensitive comparison part of the
standard string class? The answer provides interesting background on the
true nature of C++ string objects.
Consider what it means for a character to have case.
Written Hebrew, Farsi, and Kanji don t use the concept of upper- and lowercase,
so for those languages this idea has no meaning. It would seem that if there
were a way to designate some languages as all uppercase or all lowercase,
we could design a generalized solution. However, some languages that employ the
concept of case also change the meaning of particular characters with
diacritical marks, for example: the cedilla in Spanish, the circumflex in
French, and the umlaut in German. For this reason, any case-sensitive collating
scheme that attempts to be comprehensive will be nightmarishly complex to use.
Although we usually treat the C++ string as a class,
this is really not the case. The string type is a specialization of a
more general constituent, the basic_string< >
template. Observe how string is declared in the Standard C++ header file:
typedef basic_string<char> string;
To understand the nature of the string class, look at the basic_string< >
template:
template<class charT, class traits =
char_traits<charT>,
class allocator =
allocator<charT> > class basic_string;
In Chapter 5, we examine templates in great detail (much
more than in Chapter 16 of Volume 1). For now, just notice that the string
type is created when the basic_string template is instantiated with char.
Inside the basic_string< > template declaration, the
line:
class traits = char_traits<charT>,
tells us that the behavior of the class made from the basic_string< >
template is specified by a class based on the template char_traits< >.
Thus, the basic_string< > template produces
string-oriented classes that manipulate types other than char (wide
characters, for example). To do this, the char_traits< > template
controls the content and collating behaviors of a variety of character sets
using the character comparison functions eq( ) (equal), ne( )
(not equal), and lt( ) (less than). The basic_string< >
string comparison functions rely on these.
This is why the string class doesn t include
case-insensitive member functions: that s not in its job description. To change
the way the string class treats character comparison, you must supply a
different char_traits< > template because that defines
the behavior of the individual character comparison member functions.
You can use this information to make a new type of string
class that ignores case. First, we ll define a new case-insensitive char_traits< >
template that inherits from the existing template. Next, we ll override only
the members we need to change to make character-by-character comparison case
insensitive. (In addition to the three lexical character comparison members
mentioned earlier, we ll also supply a new implementation for the char_traits
functions find( ) and compare( )) . Finally, we ll typedef
a new class based on basic_string, but using the case-insensitive ichar_traits
template for its second argument:
//: C03:ichar_traits.h
// Creating your own character traits.
#ifndef ICHAR_TRAITS_H
#define ICHAR_TRAITS_H
#include <cassert>
#include <cctype>
#include <cmath>
#include <cstddef>
#include <ostream>
#include <string>
using std::allocator;
using std::basic_string;
using std::char_traits;
using std::ostream;
using std::size_t;
using std::string;
using std::toupper;
using std::tolower;
struct ichar_traits : char_traits<char> {
// We'll only change character-by-
// character comparison functions
static bool eq(char c1st, char c2nd) {
return toupper(c1st) == toupper(c2nd);
}
static bool ne(char c1st, char c2nd) {
return !eq(c1st, c2nd);
}
static bool lt(char c1st, char c2nd) {
return toupper(c1st) < toupper(c2nd);
}
static int
compare(const char* str1, const char* str2, size_t n)
{
for(size_t i = 0; i < n; ++i) {
if(str1 == 0)
return -1;
else if(str2 == 0)
return 1;
else if(tolower(*str1) < tolower(*str2))
return -1;
else if(tolower(*str1) > tolower(*str2))
return 1;
assert(tolower(*str1) == tolower(*str2));
++str1; ++str2; // Compare the other chars
}
return 0;
}
static const char*
find(const char* s1, size_t n, char c) {
while(n-- > 0)
if(toupper(*s1) == toupper(c))
return s1;
else
++s1;
return 0;
}
};
typedef basic_string<char, ichar_traits> istring;
inline ostream& operator<<(ostream& os,
const istring& s) {
return os << string(s.c_str(), s.length());
}
#endif // ICHAR_TRAITS_H ///:~
We provide a typedef named istring so that our
class will act like an ordinary string in every way, except that it will
make all comparisons without respect to case. For convenience, we ve also
provided an overloaded operator<<( ) so that you can print istrings.
Here s an example:
//: C03:ICompare.cpp
#include <cassert>
#include <iostream>
#include "ichar_traits.h"
using namespace std;
int main() {
// The same letters except for case:
istring first = "tHis";
istring second = "ThIS";
cout << first << endl;
cout << second << endl;
assert(first.compare(second) == 0);
assert(first.find('h') == 1);
assert(first.find('I') == 2);
assert(first.find('x') == string::npos);
} ///:~
This is just a toy example. To make istring fully
equivalent to string, we d have to create the other functions necessary
to support the new istring type.
The <string> header provides a wide string
class via the following typedef:
typedef basic_string<wchar_t> wstring;
Wide string support also reveals itself in wide streams
(wostream in place of ostream, also defined in <iostream>)
and in the header <cwctype>, a wide-character version of <cctype>.
This along with the wchar_t specialization of char_traits in the
standard library allows us to do a wide-character version of ichar_traits:
//: C03:iwchar_traits.h {-g++}
// Creating your own wide-character traits.
#ifndef IWCHAR_TRAITS_H
#define IWCHAR_TRAITS_H
#include <cassert>
#include <cmath>
#include <cstddef>
#include <cwctype>
#include <ostream>
#include <string>
using std::allocator;
using std::basic_string;
using std::char_traits;
using std::size_t;
using std::towlower;
using std::towupper;
using std::wostream;
using std::wstring;
struct iwchar_traits : char_traits<wchar_t> {
// We'll only change character-by-
// character comparison functions
static bool eq(wchar_t c1st, wchar_t c2nd) {
return towupper(c1st) == towupper(c2nd);
}
static bool ne(wchar_t c1st, wchar_t c2nd) {
return towupper(c1st) != towupper(c2nd);
}
static bool lt(wchar_t c1st, wchar_t c2nd) {
return towupper(c1st) < towupper(c2nd);
}
static int compare(
const wchar_t* str1, const wchar_t* str2, size_t n)
{
for(size_t i = 0; i < n; i++) {
if(str1 == 0)
return -1;
else if(str2 == 0)
return 1;
else if(towlower(*str1) < towlower(*str2))
return -1;
else if(towlower(*str1) > towlower(*str2))
return 1;
assert(towlower(*str1) == towlower(*str2));
++str1; ++str2; // Compare the other wchar_ts
}
return 0;
}
static const wchar_t*
find(const wchar_t* s1, size_t n, wchar_t c) {
while(n-- > 0)
if(towupper(*s1) == towupper(c))
return s1;
else
++s1;
return 0;
}
};
typedef basic_string<wchar_t, iwchar_traits>
iwstring;
inline wostream& operator<<(wostream& os,
const iwstring& s) {
return os << wstring(s.c_str(), s.length());
}
#endif // IWCHAR_TRAITS_H ///:~
As you can see, this is mostly an exercise in placing a w
in the appropriate place in the source code. The test program looks like this:
//: C03:IWCompare.cpp {-g++}
#include <cassert>
#include <iostream>
#include "iwchar_traits.h"
using namespace std;
int main() {
// The same letters except for case:
iwstring wfirst = L"tHis";
iwstring wsecond = L"ThIS";
wcout << wfirst << endl;
wcout << wsecond << endl;
assert(wfirst.compare(wsecond) == 0);
assert(wfirst.find('h') == 1);
assert(wfirst.find('I') == 2);
assert(wfirst.find('x') == wstring::npos);
} ///:~
Unfortunately, some compilers still do not provide robust
support for wide characters.
Thinking in C++ Vol 2 - Practical Programming |
Prev |
Home |
Next |