Linxutopia - Thinking in C++ - 4: Data Abstraction

Thinking in C++
Prev	Contents / Index	Next

The basic object

C# Essentials
eBook

$9.99

eBookFrenzy.com

Step one is exactly that. C++ functions can be placed inside structs as “member functions.” Here’s what it looks like after converting the C version of CStash to the C++ Stash:

//: C04:CppLib.h
// C-like library converted to C++

struct Stash {
  int size;      // Size of each space
  int quantity;  // Number of storage spaces
  int next;      // Next empty space
   // Dynamically allocated array of bytes:
  unsigned char* storage;
  // Functions!
  void initialize(int size);
  void cleanup();
  int add(const void* element);
  void* fetch(int index);
  int count();
  void inflate(int increase);
}; ///:~

First, notice there is no typedef . Instead of requiring you to create a typedef, the C++ compiler turns the name of the structure into a new type name for the program (just as int, char, float and double are type names).

All the data members are exactly the same as before, but now the functions are inside the body of the struct. In addition, notice that the first argument from the C version of the library has been removed. In C++, instead of forcing you to pass the address of the structure as the first argument to all the functions that operate on that structure, the compiler secretly does this for you. Now the only arguments for the functions are concerned with what the function does, not the mechanism of the function’s operation.

It’s important to realize that the function code is effectively the same as it was with the C version of the library. The number of arguments is the same (even though you don’t see the structure address being passed in, it’s still there), and there’s only one function body for each function. That is, just because you say

Stash A, B, C;

doesn’t mean you get a different add( ) function for each variable.

So the code that’s generated is almost identical to what you would have written for the C version of the library. Interestingly enough, this includes the “name decoration” you probably would have done to produce Stash_initialize( ), Stash_cleanup( ), and so on. When the function name is inside the struct, the compiler effectively does the same thing. Therefore, initialize( ) inside the structure Stash will not collide with a function named initialize( ) inside any other structure, or even a global function named initialize( ). Most of the time you don’t have to worry about the function name decoration – you use the undecorated name. But sometimes you do need to be able to specify that this initialize( ) belongs to the struct Stash, and not to any other struct. In particular, when you’re defining the function you need to fully specify which one it is. To accomplish this full specification, C++ has an operator (::) called the scope resolution operator (named so because names can now be in different scopes: at global scope or within the scope of a struct). For example, if you want to specify initialize( ), which belongs to Stash, you say Stash::initialize(int size). You can see how the scope resolution operator is used in the function definitions:

//: C04:CppLib.cpp {O}
// C library converted to C++
// Declare structure and functions:
#include "CppLib.h"
#include <iostream>
#include <cassert>
using namespace std;
// Quantity of elements to add
// when increasing storage:
const int increment = 100;

void Stash::initialize(int sz) {
  size = sz;
  quantity = 0;
  storage = 0;
  next = 0;
}

int Stash::add(const void* element) {
  if(next >= quantity) // Enough space left?
    inflate(increment);
  // Copy element into storage,
  // starting at next empty space:
  int startBytes = next * size;
  unsigned char* e = (unsigned char*)element;
  for(int i = 0; i < size; i++)
    storage[startBytes + i] = e[i];
  next++;
  return(next - 1); // Index number
}

void* Stash::fetch(int index) {
  // Check index boundaries:
  assert(0 <= index);
  if(index >= next)
    return 0; // To indicate the end
  // Produce pointer to desired element:
  return &(storage[index * size]);
}

int Stash::count() {
  return next; // Number of elements in CStash
}

void Stash::inflate(int increase) {
  assert(increase > 0);
  int newQuantity = quantity + increase;
  int newBytes = newQuantity * size;
  int oldBytes = quantity * size;
  unsigned char* b = new unsigned char[newBytes];
  for(int i = 0; i < oldBytes; i++)
    b[i] = storage[i]; // Copy old to new
  delete []storage; // Old storage
  storage = b; // Point to new memory
  quantity = newQuantity;
}

void Stash::cleanup() {
  if(storage != 0) {
    cout << "freeing storage" << endl;
    delete []storage;
  }
} ///:~

There are several other things that are different between C and C++. First, the declarations in the header files are required by the compiler. In C++ you cannot call a function without declaring it first. The compiler will issue an error message otherwise. This is an important way to ensure that function calls are consistent between the point where they are called and the point where they are defined. By forcing you to declare the function before you call it, the C++ compiler virtually ensures that you will perform this declaration by including the header file. If you also include the same header file in the place where the functions are defined, then the compiler checks to make sure that the declaration in the header and the function definition match up. This means that the header file becomes a validated repository for function declarations and ensures that functions are used consistently throughout all translation units in the project.

Of course, global functions can still be declared by hand every place where they are defined and used. (This is so tedious that it becomes very unlikely.) However, structures must always be declared before they are defined or used, and the most convenient place to put a structure definition is in a header file, except for those you intentionally hide in a file.

You can see that all the member functions look almost the same as when they were C functions, except for the scope resolution and the fact that the first argument from the C version of the library is no longer explicit. It’s still there, of course, because the function has to be able to work on a particular struct variable. But notice, inside the member function, that the member selection is also gone! Thus, instead of saying s–>size = sz; you say size = sz; and eliminate the tedious s–>, which didn’t really add anything to the meaning of what you were doing anyway. The C++ compiler is apparently doing this for you. Indeed, it is taking the “secret” first argument (the address of the structure that we were previously passing in by hand) and applying the member selector whenever you refer to one of the data members of a struct. This means that whenever you are inside the member function of another struct, you can refer to any member (including another member function) by simply giving its name. The compiler will search through the local structure’s names before looking for a global version of that name. You’ll find that this feature means that not only is your code easier to write, it’s a lot easier to read.

But what if, for some reason, you want to be able to get your hands on the address of the structure? In the C version of the library it was easy because each function’s first argument was a CStash* called s. In C++, things are even more consistent. There’s a special keyword, called this, which produces the address of the struct. It’s the equivalent of the ‘s’ in the C version of the library. So we can revert to the C style of things by saying

this->size = Size;

The code generated by the compiler is exactly the same, so you don’t need to use this in such a fashion; occasionally, you’ll see code where people explicitly use this-> everywhere but it doesn’t add anything to the meaning of the code and often indicates an inexperienced programmer. Usually, you don’t use this often, but when you need it, it’s there (some of the examples later in the book will use this).

There’s one last item to mention. In C, you could assign a void* to any other pointer like this:

int i = 10;
void* vp = &i; // OK in both C and C++
int* ip = vp; // Only acceptable in C

and there was no complaint from the compiler. But in C++, this statement is not allowed. Why? Because C is not so particular about type information, so it allows you to assign a pointer with an unspecified type to a pointer with a specified type. Not so with C++. Type is critical in C++, and the compiler stamps its foot when there are any violations of type information. This has always been important, but it is especially important in C++ because you have member functions in structs. If you could pass pointers to structs around with impunity in C++, then you could end up calling a member function for a struct that doesn’t even logically exist for that struct! A real recipe for disaster. Therefore, while C++ allows the assignment of any type of pointer to a void* (this was the original intent of void*, which is required to be large enough to hold a pointer to any type), it will not allow you to assign a void pointer to any other type of pointer. A cast is always required to tell the reader and the compiler that you really do want to treat it as the destination type.

This brings up an interesting issue. One of the important goals for C++ is to compile as much existing C code as possible to allow for an easy transition to the new language. However, this doesn’t mean any code that C allows will automatically be allowed in C++. There are a number of things the C compiler lets you get away with that are dangerous and error-prone. (We’ll look at them as the book progresses.) The C++ compiler generates warnings and errors for these situations. This is often much more of an advantage than a hindrance. In fact, there are many situations in which you are trying to run down an error in C and just can’t find it, but as soon as you recompile the program in C++, the compiler points out the problem! In C, you’ll often find that you can get the program to compile, but then you have to get it to work. In C++, when the program compiles correctly, it often works, too! This is because the language is a lot stricter about type.

You can see a number of new things in the way the C++ version of Stash is used in the following test program:

//: C04:CppLibTest.cpp
//{L} CppLib
// Test of C++ library
#include "CppLib.h"
#include "../require.h"
#include <fstream>
#include <iostream>
#include <string>
using namespace std;

int main() {
  Stash intStash;
  intStash.initialize(sizeof(int));
  for(int i = 0; i < 100; i++)
    intStash.add(&i);
  for(int j = 0; j < intStash.count(); j++)
    cout << "intStash.fetch(" << j << ") = "
         << *(int*)intStash.fetch(j)
         << endl;
  // Holds 80-character strings:
  Stash stringStash;
  const int bufsize = 80;
  stringStash.initialize(sizeof(char) * bufsize);
  ifstream in("CppLibTest.cpp");
  assure(in, "CppLibTest.cpp");
  string line;
  while(getline(in, line))
    stringStash.add(line.c_str());
  int k = 0;
  char* cp;
  while((cp =(char*)stringStash.fetch(k++)) != 0)
    cout << "stringStash.fetch(" << k << ") = "
         << cp << endl;
  intStash.cleanup();
  stringStash.cleanup();
} ///:~

One thing you’ll notice is that the variables are all defined “on the fly” (as introduced in the previous chapter). That is, they are defined at any point in the scope, rather than being restricted – as in C – to the beginning of the scope.

The code is quite similar to CLibTest.cpp, but when a member function is called, the call occurs using the member selection operator ‘.’ preceded by the name of the variable. This is a convenient syntax because it mimics the selection of a data member of the structure. The difference is that this is a function member, so it has an argument list.

Of course, the call that the compiler actually generates looks much more like the original C library function. Thus, considering name decoration and the passing of this, the C++ function call intStash.initialize(sizeof(int), 100) becomes something like Stash_initialize(&intStash, sizeof(int), 100). If you ever wonder what’s going on underneath the covers, remember that the original C++ compiler cfront from AT&T produced C code as its output, which was then compiled by the underlying C compiler. This approach meant that cfront could be quickly ported to any machine that had a C compiler, and it helped to rapidly disseminate C++ compiler technology. But because the C++ compiler had to generate C, you know that there must be some way to represent C++ syntax in C (some compilers still allow you to produce C code).

There’s one other change from ClibTest.cpp, which is the introduction of the require.h header file. This is a header file that I created for this book to perform more sophisticated error checking than that provided by assert( ). It contains several functions, including the one used here called assure( ), which is used for files. This function checks to see if the file has successfully been opened, and if not it reports to standard error that the file could not be opened (thus it needs the name of the file as the second argument) and exits the program. The require.h functions will be used throughout the book, in particular to ensure that there are the right number of command-line arguments and that files are opened properly. The require.h functions replace repetitive and distracting error-checking code, and yet they provide essentially useful error messages. These functions will be fully explained later in the book.

Thinking in C++
Prev	Contents / Index	Next