The most common scenario is a live running service that needs to be
upgraded with a new version of the code. The new code has been
prepared and uploaded to the production server, and the server has
been restarted. Unfortunately, the service does not work anymore.
What could be worse than that? There is no way back, because the
original code has been overwritten with the new but non-working code.
Another scenario is where a whole set of files is being transferred
to the live server but some network problem has occurred in the
middle, which has slowed things down or totally aborted the transfer.
With some of the files old and some new, the service is most likely
broken. Since some files were overwritten, you can't
roll back to the previously working version of the service.
No matter what file transfer technique is used, be it FTP, NFS, or
anything else, live running code should never be directly overwritten
during file transfer. Instead, files should be transferred to a
temporary directory on the live machine, ready to be moved when
necessary. If the transfer fails, it can then be restarted safely.
Both scenarios can be made safer with two approaches. First, do not
overwrite working files. Second, use a revision control system such
as CVS so that changes to working code can easily be undone if the
working code is accidentally overwritten. Revision control will be
covered later in this chapter.
We recommend performing all updates on the live server in the
following sequence. Assume for this example that the
project's code directory is
/home/httpd/perl/rel. When
we're about to update the files, we create a new
directory, /home/httpd/perl/test, into which we
copy the new files. Then we do some final sanity checks: check that
file permissions are readable and executable for the user the server
is running under, and run perl -Tcw on the new
modules to make sure there are no syntax errors in them.
To save some typing, we set up some aliases for some of the
apachectl commands and for
tailing the error_log file:
panic% alias graceful /home/httpd/httpd_perl/bin/apachectl graceful
panic% alias restart /home/httpd/httpd_perl/bin/apachectl restart
panic% alias start /home/httpd/httpd_perl/bin/apachectl start
panic% alias stop /home/httpd/httpd_perl/bin/apachectl stop
panic% alias err tail -f /home/httpd/httpd_perl/logs/error_log
Finally, when we think we are ready, we do:
panic% cd /home/httpd/perl
panic% mv rel old && mv test rel && stop && sleep 3 && restart && err
Note that all the commands are typed as a single line, joined by
&&, and only at the end should the Enter
key be pressed. The && ensures that if any
command fails, the following commands will not be executed.
The elements of this command line are:
- mv rel old &&
-
Backs up the working directory to old, so none
of the original code is deleted or overwritten
- mv test rel &&
-
Puts the new code in place of the original
- stop &&
-
Stops the server
- sleep 3 &&
-
Allows the server a few seconds to shut down (it might need a longer
sleep)
- restart &&
-
Restarts the server
- err
-
tails the error_log file to
make sure that everything is OK
If mv is overriden by a global alias
mv -i, which requires confirming every action,
you will need to call mv -f to override the
-i option.
When updating code on a remote machine, it's a good
idea to prepend nohup to the beginning of the
command line:
panic% nohup mv rel old && mv test rel && stop && sleep 3 && restart && err
This approach ensures that if the connection is suddenly dropped, the
server will not stay down if the last command that executes is
stop.
apachectl generates its status messages a little
too early. For example, when we execute apachectl
stop, a message saying that the server has been stopped is
displayed, when in fact the server is still running. Similarly, when
we execute apachectl start, a message is
displayed saying that the server has been started, while it is
possible that it hasn't yet. In both cases, this
happens because these status messages are not generated by Apache
itself. Do not rely on them. Rely on the
error_log file instead, where the running Apache
server indicates its real status.
Also note that we use restart and not just
start. This is because of
Apache's potentially long stopping times if it has
to run lots of destruction and cleanup code on exit. If
start is used and Apache has not yet released
the port it is listening to, the start will fail and the
error_log will report that the port is in use.
For example:
Address already in use: make_sock: could not bind to port 8000
However, if restart is used,
apachectl will wait for the server to quit and
unbind the port and will then cleanly restart it.
Now, what happens if the new modules are broken and the newly
restarted server reports problems or refuses to start at all?
The aliased err command executes tail
-f on the error_log, so that the
failed restart or any other problems will be immediately apparent.
The situation can quickly and easily be rectified by returning the
system to its pre-upgrade state with this command:
panic% mv rel bad && mv old rel && stop && sleep 3 && restart && err