A: Coding Style
This appendix is not about
indenting and placement of parentheses and curly braces, although that will be
mentioned. It is about the general guidelines used in
this book for organizing the code
listings.
Although many of these issues have been
introduced throughout the book, this appendix appears at the end so it can be
assumed that every topic is fair game, and if you don’t understand
something you can look it up in the appropriate section.
All the decisions about coding style in
this book have been deliberately considered and made, sometimes over a period of
years. Of course, everyone has their reasons for organizing code the way they
do, and I’m just trying to tell you how I arrived at mine and the
constraints and environmental factors that brought me to those
decisions.
In the text of this book, identifiers
(function, variable, and class names) are set in bold. Most keywords will
also be set in bold, except for those keywords that are used so much that the
bolding can become tedious, such as “class” and
“virtual.”
I use a particular coding style for the
examples in this book. It was developed over a number of years, and was
partially inspired by Bjarne Stroustrup’s style in
his original The C++ Programming
Language.[64]
The subject of formatting style is good for hours of hot debate, so I’ll
just say I’m not trying to dictate correct style via my examples; I have
my own motivation for using the style that I do. Because C++ is a free-form
programming language, you can continue to use whatever style you’re
comfortable with.
That said, I will note that it is
important to have a consistent formatting style within a project. If you search
the Internet, you will find a number of tools that can be used to reformat all
the code in your project to achieve this valuable consistency.
The programs in this book are files that
are automatically extracted from the text of the book, which allows them to be
tested to ensure that they work correctly. Thus, the code files printed in the
book should all work without compile-time errors when compiled with an
implementation that conforms to Standard C++ (note that not all compilers
support all language features). The errors that should cause compile-time
error messages are commented out with the comment //! so they can be
easily discovered and tested using automatic means. Errors discovered and
reported to the author will appear first in the electronic version of the book
(at www.BruceEckel.com) and later in updates of the
book.
One of the standards in this book is that
all programs will compile and link without errors (although they will sometimes
cause warnings). To this end, some of the programs, which demonstrate only a
coding example and don’t represent stand-alone programs, will have empty
main( ) functions, like this
int main() {}
This allows the linker to complete
without an error.
The standard for main( ) is
to return an int, but Standard C++ states that if there is no
return statement inside main( ), the compiler will
automatically generate code to return 0. This option (no return
statement in main( )) will be used in this book (some
compilers may still generate warnings for this, but those are not compliant with
Standard C++).
In
C, it has been traditional to name header files (containing declarations) with
an extension of .h and implementation files (that cause storage to be
allocated and code to be generated) with an extension of .c. C++ went
through an evolution. It was first developed on Unix, where the operating system
was aware of upper and lower case in file names. The original file names were
simply capitalized versions of the C extensions: .H and .C. This
of course didn’t work for operating systems that didn’t distinguish
upper and lower case, such as DOS. DOS C++ vendors used extensions of hxx
and cxx for header files and implementation files, respectively, or
hpp and cpp. Later, someone figured out that the only reason you
needed a different extension for a file was so the compiler could determine
whether to compile it as a C or C++ file. Because the compiler never compiled
header files directly, only the implementation file extension needed to be
changed. The custom, across virtually all systems, has now become to use
cpp for implementation files and h for header files. Note that
when including Standard C++ header files, the option of having no file name
extension is used, i.e.: #include <iostream>.
Begin and end comment
tags
A
very important issue with this book is that all code that you see in the book
must be verified to be correct (with at least one compiler). This is
accomplished by automatically extracting the files from the book. To facilitate
this, all code listings that are meant to be compiled (as opposed to code
fragments, of which there are few) have comment tags at the beginning and end.
These tags are used by the code-extraction tool ExtractCode.cpp in Volume
2 of this book (which you can find on the Web site www.BruceEckel.com) to
pull each code listing out of the plain-ASCII text version of this
book.
The end-listing tag simply tells
ExtractCode.cpp that it’s the end of the listing, but the
begin-listing tag is followed by information about what subdirectory the file
belongs in (generally organized by chapters, so a file that belongs in Chapter 8
would have a tag of C08), followed by a colon and the name of the listing
file.
Because ExtractCode.cpp also
creates a makefile for each subdirectory,
information about how a program is made and the command-line used to test it is
also incorporated into the listings. If a program is stand-alone (it
doesn’t need to be linked with anything else) it has no extra information.
This is also true for header files. However, if it doesn’t contain a
main( ) and is meant to be linked with something else, then it has
an {O} after the file name. If this listing is meant to be the main
program but needs to be linked with other components, there’s a separate
line that begins with //{L} and continues with all the files that need to
be linked (without extensions, since those can vary from platform to
platform).
You can find examples throughout the
book.
If a file should be extracted but the
begin- and end-tags should not be included in the extracted file (for example,
if it’s a file of test data) then the begin-tag is immediately followed by
a ‘!’.
Parentheses, braces, and
indentation
You may notice the formatting style in
this book is different from many traditional C styles. Of course, everyone
thinks their own style is the most rational. However, the style used here has a
simple logic behind it, which will be presented here mixed in with ideas on why
some of the other styles developed.
The formatting style is motivated by one
thing: presentation, both in print and in live seminars. You may feel your needs
are different because you don’t make a lot of presentations. However,
working code is read much more than it is written, and so it should be easy for
the reader to perceive. My two most important criteria are
“scannability” (how easy it is for the reader to grasp the meaning
of a single line) and the number of lines that can fit on a page. This latter
may sound funny, but when you are giving a live presentation, it’s very
distracting for the audience if the presenter must shuffle back and forth
between slides, and a few wasted lines can cause this.
Everyone seems to agree that code inside
braces should be indented. What people don’t agree on – and the
place where there’s the most inconsistency within formatting styles
– is this: Where does the opening brace go? This one question, I think, is
what causes such variations among coding styles (For an enumeration of coding
styles, see C++ Programming Guidelines, by Tom Plum and
Dan Saks, Plum Hall 1991.) I’ll try to convince
you that many of today’s coding styles come from pre-Standard C
constraints (before function prototypes) and are thus inappropriate
now.
First, my answer to that key question:
the opening brace should always go on the same line as the
“precursor” (by which I mean “whatever the body is about: a
class, function, object definition, if statement, etc.”). This is a
single, consistent rule I apply to all of the code I write, and it makes
formatting much simpler. It makes the “scannability” easier –
when you look at this line:
int func(int a);
you know, by the semicolon at the end of
the line, that this is a declaration and it goes no further, but when you see
the line:
int func(int a) {
you immediately know it’s a
definition because the line finishes with an opening brace, not a semicolon. By
using this approach, there’s no difference in where you place the opening
parenthesis for a multi-line definition:
int func(int a) {
int b = a + 1;
return b * 2;
}
and for a single-line definition that is
often used for inlines:
int func(int a) { return (a + 1) * 2; }
class Thing;
is a class name declaration,
and
class Thing {
is a class definition. You can tell by
looking at the single line in all cases whether it’s a declaration or
definition. And of course, putting the opening brace on the same line, instead
of a line by itself, allows you to fit more lines on a page.
So why do we have so many other styles?
In particular, you’ll notice that most people create classes following the
style above (which Stroustrup uses in all editions of his book The C++
Programming Language from Addison-Wesley) but create function definitions by
putting the opening brace on a single line by itself (which also engenders many
different indentation styles). Stroustrup does this except for short inline
functions. With the approach I describe here, everything is consistent –
you name whatever it is (class, function, enum, etc.) and on that
same line you put the opening brace to indicate that the body for this thing is
about to follow. Also, the opening brace is the same for short inlines and
ordinary function definitions.
I assert that the style of function
definition used by many folks comes from pre-function-prototyping C, in which
you didn’t declare the arguments inside the parentheses, but instead
between the closing parenthesis and the opening curly brace (this shows
C’s assembly-language roots):
void bar()
int x;
float y;
{
/* body here */
}
Here, it would be quite ungainly to put
the opening brace on the same line, so no one did it. However, they did make
various decisions about whether the braces should be indented with the body of
the code or whether they should be at the level of the “precursor.”
Thus, we got many different formatting styles.
There are other arguments for placing the
brace on the line immediately following the declaration (of a class, struct,
function, etc.). The following came from a reader, and is presented here so you
know what the issues are:
Experienced ‘vi’ (vim) users
know that typing the ‘]’ key twice will take the user to the next
occurrence of ‘{‘ (or ^L) in column 0. This feature is extremely
useful in navigating code (jumping to the next function or class definition).
[My comment: when I was initially working under Unix, GNU Emacs was just
appearing and I became enmeshed in that. As a result, ‘vi’ has never
made sense to me, and thus I do not think in terms of “column 0
locations.” However, there is a fair contingent of ‘vi’ users
out there, and they are affected by this issue.]
Placing the ‘{‘ on the next
line eliminates some confusing code in complex conditionals, aiding in the
scannability. Example:
if(cond1
&& cond2
&& cond3) {
statement;
}
The above [asserts the reader] has poor
scannability. However,
if (cond1
&& cond2
&& cond3)
{
statement;
}
breaks up the ‘if’ from the
body, resulting in better readability. [Your opinions on whether this is true
will vary depending on what you’re used to.]
Finally, it’s much easier to
visually align braces when they are aligned in the same column. They visually
"stick out" much better. [End of reader comment]
The issue of where to put the opening
curly brace is probably the most discordant issue. I’ve learned to scan
both forms, and in the end it comes down to what you’ve grown comfortable
with. However, I note that the official Java coding standard (found on
Sun’s Java Web site) is effectively the same as the one I present here
– since more folks are beginning to program in both languages, the
consistency between coding styles may be helpful.
The approach I use removes all the
exceptions and special cases, and logically produces a single style of
indentation as well. Even within a function body, the consistency holds, as
in:
for(int i = 0; i < 100; i++) {
cout << i << endl;
cout << x * i << endl;
}
The style is easy to teach and to
remember – you use a single, consistent rule for all your formatting, not
one for classes, two for functions (one-line inlines vs. multi-line), and
possibly others for for loops, if statements, etc. The consistency
alone, I think, makes it worthy of consideration. Above all, C++ is a newer
language than C, and although we must make many concessions to C, we
shouldn’t be carrying too many artifacts with us that cause problems in
the future. Small problems multiplied by many lines of code become big problems.
For a thorough examination of the subject, albeit in C, see C Style:
Standards and Guidelines, by David Straker
(Prentice-Hall 1992).
The other constraint I must work under is
the line width, since the book has a limitation of 50 characters. What happens
when something is too long to fit on one line? Well, again I strive to have a
consistent policy for the way lines are broken up, so they can be easily viewed.
As long as something is part of a single definition, argument list, etc.,
continuation lines should be indented one level in from the beginning of that
definition, argument list, etc.
Those familiar with Java will notice that
I have switched to using the standard Java style for all identifier names.
However, I cannot be completely consistent here because identifiers in the
Standard C and C++ libraries do not follow this style.
The style is quite straightforward. The
first letter of an identifier is only capitalized if that identifier is a class.
If it is a function or variable, then the first letter is lowercase. The rest of
the identifier consists of one or more words, run together but distinguished by
capitalizing each word. So a class looks like this:
class FrenchVanilla : public IceCream {
an object identifier looks like
this:
FrenchVanilla myIceCreamCone(3);
and a function looks like
this:
void eatIceCreamCone();
(for either a member function or a
regular function).
The one exception is for compile-time
constants (const or #define), in which all of the letters in the
identifier are uppercase.
The value of the style is that
capitalization has meaning – you can see from the first letter whether
you’re talking about a class or an object/method. This is especially
useful when static class members are accessed.
Order of header
inclusion
Headers are
included in order from “the most specific to the most general.” That
is, any header files in the local directory are included first, then any of my
own “tool” headers, such as
require.h, then any third-party library headers,
then the Standard C++ Library headers, and finally the C library
headers.
The justification for this comes from
John Lakos in Large-Scale C++ Software Design
(Addison-Wesley, 1996):
Latent usage errors can be avoided by
ensuring that the .h file of a component parses by itself – without
externally-provided declarations or definitions... Including the .h file as the
very first line of the .c file ensures that no critical piece of information
intrinsic to the physical interface of the component is missing from the .h file
(or, if there is, that you will find out about it as soon as you try to compile
the .c file).
If the order of header inclusion goes
“from most specific to most general,” then it’s more likely
that if your header doesn’t parse by itself, you’ll find out about
it sooner and prevent annoyances down the road.
Include guards on header
files
Include
guards are always used inside header files to prevent multiple inclusion of
a header file during the compilation of a single .cpp file. The include
guards are implemented using a preprocessor #define and checking to see
that a name hasn’t already been defined. The name used for the guard is
based on the name of the header file, with all letters of the file name
uppercase and replacing the ‘.’ with an underscore. For
example:
// IncludeGuard.h
#ifndef INCLUDEGUARD_H
#define INCLUDEGUARD_H
// Body of header file here...
#endif // INCLUDEGUARD_H
The identifier on the last line is
included for clarity. Although some preprocessors ignored any characters after
an #endif, that
isn’t standard behavior and so the identifier is commented.
In header files, any
“pollution” of the
namespace in which the header is
included must be scrupulously avoided. That is, if you change the namespace
outside of a function or class, you will cause that change to occur for any file
that includes your header, resulting in all kinds of problems. No
using declarations of any kind are allowed
outside of function definitions, and no global using
directives are allowed in header files.
In cpp files, any global
using directives will only affect that file, and so in this book they are
generally used to produce more easily-readable code, especially in small
programs.
Use of require( ) and
assure( )
The
require( ) and
assure( ) functions defined in
require.h are used consistently throughout most
of the book, so that they may properly report problems. If you are familiar with
the concepts of preconditions and
postconditions (introduced by Bertrand Meyer) you
will recognize that the use of require( ) and assure( )
more or less provide preconditions (usually) and postconditions (occasionally).
Thus, at the beginning of a function, before any of the “core” of
the function is executed, the preconditions are checked to make sure everything
is proper and that all of the necessary conditions are correct. Then the
“core” of the function is executed, and sometimes some
postconditions are checked to make sure that the new state of the data is within
defined parameters. You’ll notice that the postcondition checks are rare
in this book, and assure( ) is primarily used to make sure that
files were opened successfully.
|