Thinking in C++ - Practical Programming

Thinking in C++ Vol 2 - Practical Programming
Prev	Home	Next

Comparing strings

C# Essentials
eBook

$9.99

eBookFrenzy.com

Comparing strings is inherently different from comparing numbers. Numbers have constant, universally meaningful values. To evaluate the relationship between the magnitudes of two strings, you must make a lexical comparison. Lexical comparison means that when you test a character to see if it is greater than or less than another character, you are actually comparing the numeric representation of those characters as specified in the collating sequence of the character set being used. Most often this will be the ASCII collating sequence, which assigns the printable characters for the English language numbers in the range 32 through 127 decimal. In the ASCII collating sequence, the first character in the list is the space, followed by several common punctuation marks, and then uppercase and lowercase letters. With respect to the alphabet, this means that the letters nearer the front have lower ASCII values than those nearer the end. With these details in mind, it becomes easier to remember that when a lexical comparison that reports s1 is greater than s2, it simply means that when the two were compared, the first differing character in s1 came later in the alphabet than the character in that same position in s2.

C++ provides several ways to compare strings, and each has advantages. The simplest to use are the nonmember, overloaded operator functions: operator ==, operator != operator >, operator <, operator >=, and operator <=.

//: C03:CompStr.h

#ifndef COMPSTR_H

#define COMPSTR_H

#include <string>

#include "../TestSuite/Test.h"

using std::string;

class CompStrTest : public TestSuite::Test {

public:

void run() {

// Strings to compare

string s1("This");

string s2("That");

test_(s1 == s1);

test_(s1 != s2);

test_(s1 > s2);

test_(s1 >= s2);

test_(s1 >= s1);

test_(s2 < s1);

test_(s2 <= s1);

test_(s1 <= s1);

}

};

#endif // COMPSTR_H ///:~

//: C03:CompStr.cpp

//{L} ../TestSuite/Test

#include "CompStr.h"

int main() {

CompStrTest t;

t.run();

return t.report();

} ///:~

The overloaded comparison operators are useful for comparing both full strings and individual string character elements.

Notice in the following example the flexibility of argument types on both the left and right side of the comparison operators. For efficiency, the string class provides overloaded operators for the direct comparison of string objects, quoted literals, and pointers to C-style strings without having to create temporary string objects.

//: C03:Equivalence.cpp

#include <iostream>

#include <string>

using namespace std;

int main() {

string s2("That"), s1("This");

// The lvalue is a quoted literal

// and the rvalue is a string:

if("That" == s2)

cout << "A match" << endl;

// The left operand is a string and the right is

// a pointer to a C-style null terminated string:

if(s1 != s2.c_str())

cout << "No match" << endl;

} ///:~

The c_str( ) function returns a const char* that points to a C-style, null-terminated string equivalent to the contents of the string object. This comes in handy when you want to pass a string to a standard C function, such as atoi( ) or any of the functions defined in the <cstring> header. It is an error to use the value returned by c_str( ) as non-const argument to any function.

You won t find the logical not (!) or the logical comparison operators (&& and ||) among operators for a string. (Neither will you find overloaded versions of the bitwise C operators &, |, ^, or ~.) The overloaded nonmember comparison operators for the string class are limited to the subset that has clear, unambiguous application to single characters or groups of characters.

The compare( ) member function offers you a great deal more sophisticated and precise comparison than the nonmember operator set. It provides overloaded versions to compare:

Two complete strings.

Part of either string to a complete string.

Subsets of two strings.

The following example compares complete strings:

//: C03:Compare.cpp

// Demonstrates compare() and swap().

#include <cassert>

#include <string>

using namespace std;

int main() {

string first("This");

string second("That");

assert(first.compare(first) == 0);

assert(second.compare(second) == 0);

// Which is lexically greater?

assert(first.compare(second) > 0);

assert(second.compare(first) < 0);

first.swap(second);

assert(first.compare(second) < 0);

assert(second.compare(first) > 0);

} ///:~

The swap( ) function in this example does what its name implies: it exchanges the contents of its object and argument. To compare a subset of the characters in one or both strings, you add arguments that define where to start the comparison and how many characters to consider. For example, we can use the following overloaded version of compare( ):

s1.compare(s1StartPos, s1NumberChars, s2, s2StartPos,
s2NumberChars);

Here s an example:

//: C03:Compare2.cpp

// Illustrate overloaded compare().

#include <cassert>

#include <string>

using namespace std;

int main() {

string first("This is a day that will live in infamy");

string second("I don't believe that this is what "

"I signed up for");

// Compare "his is" in both strings:

assert(first.compare(1, 7, second, 22, 7) == 0);

// Compare "his is a" to "his is w":

assert(first.compare(1, 9, second, 22, 9) < 0);

} ///:~

In the examples so far, we have used C-style array indexing syntax to refer to an individual character in a string. C++ strings provide an alternative to the s[n] notation: the at( ) member. These two indexing mechanisms produce the same result in C++ if all goes well:

//: C03:StringIndexing.cpp

#include <cassert>

#include <string>

using namespace std;

int main() {

string s("1234");

assert(s[1] == '2');

assert(s.at(1) == '2');

} ///:~

There is one important difference, however, between [ ] and at( ). When you try to reference an array element that is out of bounds, at( ) will do you the kindness of throwing an exception, while ordinary [ ] subscripting syntax will leave you to your own devices:

//: C03:BadStringIndexing.cpp

#include <exception>

#include <iostream>

#include <string>

using namespace std;

int main() {

string s("1234");

// at() saves you by throwing an exception:

try {

s.at(5);

} catch(exception& e) {

cerr << e.what() << endl;

}

} ///:~

Responsible programmers will not use errant indexes, but should you want to benefits of automatic index checking, using at( ) in place of [ ] will give you a chance to gracefully recover from references to array elements that don t exist. Execution of this program on one of our test compilers gave the following output:

invalid string position

The at( ) member throws an object of class out_of_range, which derives (ultimately) from std::exception. By catching this object in an exception handler, you can take appropriate remedial actions such as recalculating the offending subscript or growing the array. Using string::operator[ ]( ) gives no such protection and is as dangerous as char array processing in C.[37]

Thinking in C++ Vol 2 - Practical Programming
Prev	Home	Next