15:
Discovering Problems
Before C was tamed into ANSI C, we had a little joke: “My code compiles, so it should run!” (Ha ha!)
This was funny only if you understood C, because at that time the C compiler would accept just about anything; C was truly a “portable assembly language” created to see if it was possible to develop a portable operating system (Unix) that could be moved from one machine architecture to another without rewriting it from scratch in the new machine’s assembly language. So C was actually created as a side-effect of building Unix and not as a general-purpose programming language.
Because C was targeted at programmers who wrote operating systems in assembly language, it was implicitly assumed that those programmers knew what they were doing and didn’t need safety nets. For example, assembly-language programmers didn’t need the compiler to check argument types and usage, and if they decided to use a data type in a different way than it was originally intended, they certainly must have good reason to do so, and the compiler didn’t get in the way. Thus, getting your pre-ANSI C program to compile was only the first step in the long process of developing a bug-free program.
The development of ANSI C along with stronger rules about what the compiler would accept came after lots of people used C for projects other than writing operating systems, and after the appearance of C++, which greatly improved your chances of having a program run decently once it compiled. Much of this improvement came through strong static type checking: “strong” because the compiler prevented you from abusing the type, “static” because ANSI C and C++ perform type checking at compile time.
To many people (myself included), the improvement was so dramatic that it appeared that strong static type checking was the answer to a large portion of our problems. Indeed, one of the motivations for Java was that C++’s type checking wasn’t strong enough (primarily because C++ had to be backward-compatible with C, and so was chained to its limitations). Thus Java has gone even further to take advantage of the benefits of type checking, and since Java has language-checking mechanisms that exist at run time (C++ doesn’t; what’s left at run time is basically assembly language—very fast, but with no self-awareness), it isn’t restricted to only static type checking.[86]
It seems, however, that language-checking mechanisms can take us only so far in our quest to develop a correctly-working program. C++ gave us programs that worked a lot sooner than C programs, but often still had problems such as memory leaks and subtle, buried bugs. Java went a long way toward solving those problems, yet it’s still quite possible to write a Java program containing nasty bugs. In addition (despite the amazing performance claims always touted by the flaks at Sun), all the safety nets in Java added additional overhead, so sometimes we run into the challenge of getting our Java programs to run fast enough for a particular need (although it’s usually more important to have a working program than one that runs at a particular speed).
This chapter presents tools to solve the problems that the compiler doesn’t. In a sense, we are admitting that the compiler can take us only so far in the creation of robust programs, so we are moving beyond the compiler and creating a build system and code that know more about what a program is and isn’t supposed to do.
One of the biggest steps forward is the incorporation of automated unit testing. This means writing tests and incorporating those tests into a build system that compiles your code and runs the tests every single time, as if the tests were part of the compilation process (you’ll soon start relying upon them as if they are). For this book, a custom testing system was developed to ensure the correctness of the program output (and to display the output directly in the code listing), but the defacto standard JUnit testing system will also be used when appropriate. To make sure that testing is automatic, tests are run as part of the build process using Ant, an open-source tool that has also become a defacto standard in Java development, and CVS, another open-source tool that maintains a repository containing all your source code for a particular project.
JDK 1.4 introduced an assertion mechanism to aid in the verification of code at run time. One of the more compelling uses of assertions is Design by Contract (DBC), a formalized way to describe the correctness of a class. In conjunction with automated testing, DBC can be a powerful tool.
Sometimes unit testing isn’t enough, and you need to track down problems in a program that runs, but doesn’t run right. In JDK 1.4, the logging API was introduced to allow you to easily report information about your program. This is a significant improvement over adding and removing println( ) statements in order to track down a problem, and this section will go into enough detail to give you a thorough grounding in this API. This chapter also provides an introduction to debugging, showing the information a typical debugger can provide to aid you in the discovery of subtle problems. Finally, you’ll learn about profiling and how to discover the bottlenecks that cause your program to run too slowly.
Unit Testing
A recent realization in programming practice is the dramatic value of unit testing. This is the process of building integrated tests into all the code that you create and running those tests every time you do a build. That way, the build process can check for more than just syntax errors, since you teach it how to check for semantic errors as well. C-style programming languages, and C++ in particular, have typically valued performance over programming safety. The reason that developing programs in Java is so much faster than in C++ (roughly twice as fast, by most accounts) is because of Java’s safety net: features like garbage collection and improved type checking. By integrating unit testing into your build process, you can extend this safety net, resulting in faster development. You can also be bolder in the changes that you make, more easily refactor your code when you discover design or implementation flaws, and in general produce a better product, more quickly.
The effect of unit testing on development is so significant that it is used throughout this book, not only to validate the code in the book, but also to display the expected output. My own experience with unit testing began when I realized that, to guarantee the correctness of code in a book, every program in that book must be automatically extracted and organized into a source tree, along with an appropriate build system. The build system used in this book is Ant (described later in this chapter), and after you install it, you can just type ant to build all the code for the book. The effect of the automatic extraction and compilation process on the code quality of the book was so immediate and dramatic that it soon became (in my mind) a requisite for any programming book—how can you trust code that you didn’t compile? I also discovered that if I wanted to make sweeping changes, I could do so using search-and-replace throughout the book or just by bashing the code around. I knew that if I introduced a flaw, the code extractor and the build system would flush it out.
As programs became more complex, however, I also found that there was a serious hole in my system. Being able to successfully compile programs is clearly an important first step, and for a published book it seems a fairly revolutionary one; usually because of the pressures of publishing, it’s quite typical to randomly open a programming book and discover a coding flaw. However, I kept getting messages from readers reporting semantic problems in my code. These problems could be discovered only by running the code. Naturally, I understood this and took some early faltering steps toward implementing a system that would perform automatic execution tests, but I had succumbed to publishing schedules, all the while knowing that there was definitely something wrong with my process and that it would come back to bite me in the form of embarrassing bug reports (in the open source world,[87] embarrassment is one of the prime motivating factors towards increasing the quality of one’s code!).
The other problem was that I lacked a structure for the testing system. Eventually, I started hearing about unit testing and JUnit, which provided a basis for a testing structure. I found the initial versions of JUnit to be intolerable because they required the programmer to write too much code to create even the simplest test suite. More recent versions have significantly reduced this required code by using reflection, so they are much more satisfactory.
I needed to solve another problem, however, and that was to validate the output of a program and to show the validated output in the book. I had gotten regular complaints that I didn’t show enough program output in the book. My attitude was that the reader should be running the programs while reading the book, and many readers did just that and benefited from it. A hidden reason for that attitude, however, was that I didn’t have a way to test that the output shown in the book was correct. From experience, I knew that over time, something would happen so that the output was no longer correct (or, I wouldn’t get it right in the first place). The simple testing framework shown here not only captures the console output of the program—and most programs in this book produce console output—but it also compares it to the expected output that is printed in the book as part of the source-code listing, so readers can see what the output will be and also know that this output has been verified by the build process, and that they can verify it themselves.
I wanted to see if the test system could be even easier and simpler to use, applying the Extreme Programming principle of “do the simplest thing that could possibly work” as a starting point, and then evolving the system as usage demands. (In addition, I wanted to try to reduce the amount of test code in an attempt to fit more functionality in less code for screen presentations.) The result[88] is the simple testing framework described next.