As a quick thought experiment, the next time you are in your data
center, look around, and imagine for a moment that it is gone. And not
just the computers. Imagine that the entire building no longer exists.
Next, imagine that your job is to get as much of the work that was being
done in the data center going in some fashion, some where, as soon as
possible. What would you do?
By thinking about this, you have taken the first step of disaster
recovery. Disaster recovery is the ability to recover from an event
impacting the functioning of your organization's data center as quickly
and completely as possible. The type of disaster may vary, but the end
goal is always the same.
The steps involved in disaster recovery are numerous and
wide-ranging. Here is a high-level overview of the process, along with
key points to keep in mind.
A backup site is vital, but it is still useless without a disaster
recovery plan. A disaster recovery plan dictates every facet of the
disaster recovery process, including but not limited to:
What events denote possible disasters
What people in the organization have the authority to declare
a disaster and thereby put the plan into effect
The sequence of events necessary to prepare the backup site
once a disaster has been declared
The roles and responsibilities of all key personnel with
respect to carrying out the plan
An inventory of the necessary hardware and software required
to restore production
A schedule listing the personnel to staff the backup site,
including a rotation schedule to support ongoing operations
without burning out the disaster team members
The sequence of events necessary to move operations from the
backup site to the restored/new data center
Disaster recovery plans often fill multiple looseleaf binders.
This level of detail is vital because in the event of an emergency,
the plan may well be the only thing left from your previous data
center (other than the last off-site backups, of course) to help you
rebuild and restore operations.
| Tip |
---|
| While disaster recovery plans should be readily available at
your workplace, copies should also be stored off-site. This way, a
disaster that destroys your workplace will not take every copy of
the disaster recovery plan with it. A good place to store a copy is
your off-site backup storage location. If it does not violate your
organization's security policies, copies may also be kept in key
team members' homes, ready for instant use. |
Such an important document deserves serious thought (and possibly
professional assistance to create).
And once such an important document is created, the knowledge it
contains must be tested periodically. Testing a disaster recovery
plan entails going through the actual steps of the plan: going to the
backup site and setting up the temporary data center, running
applications remotely, and resuming normal operations after the
"disaster" is over. Most tests do not attempt to perform 100% of the
tasks in the plan; instead a representative system and application is
selected to be relocated to the backup site, put into production for a
period of time, and returned to normal operation at the end of the
test.
| Note |
---|
| Although it is an overused phrase, a disaster recovery plan must
be a living document; as the data center changes, the plan must be
updated to reflect those changes. In many ways, an out-of-date
disaster recovery plan can be worse than no plan at all, so make it
a point to have regular (quarterly, for example) reviews and updates
of the plan. |
One of the most important aspects of disaster recovery is to have
a location from which the recovery can take place. This location is
known as a backup site. In the event of a
disaster, a backup site is where your data center will be recreated,
and where you will operate from, for the length of the
disaster.
There are three different types of backup sites:
Cold backup sites
Warm backup sites
Hot backup sites
Obviously these terms do not refer to the temperature of the
backup site. Instead, they refer to the effort required to begin
operations at the backup site in the event of a disaster.
A cold backup site is little more than an appropriately configured
space in a building. Everything required to restore service to your
users must be procured and delivered to the site before the process of
recovery can begin. As you can imagine, the delay going from a cold
backup site to full operation can be substantial.
Cold backup sites are the least expensive sites.
A warm backup site is already stocked with hardware representing a
reasonable facsimile of that found in your data center. To restore
service, the last backups from your off-site storage facility must be
delivered, and bare metal restoration completed, before the real work
of recovery can begin.
Hot backup sites have a virtual mirror image of your current data
center, with all systems configured and waiting only for the last
backups of your user data from your off-site storage facility. As you
can imagine, a hot backup site can often be brought up to full
production in no more than a few hours.
A hot backup site is the most expensive approach to disaster
recovery.
Backup sites can come from three different sources:
Companies specializing in providing disaster recovery
services
Other locations owned and operated by your organization
A mutual agreement with another organization to share data
center facilities in the event of a disaster
Each approach has its good and bad points. For example,
contracting with a disaster recovery firm often gives you access to
professionals skilled in guiding organizations through the process of
creating, testing, and implementing a disaster recovery plan. As you
might imagine, these services do not come without cost.
Using space in another facility owned and operated by your
organization can be essentially a zero-cost option, but stocking the
backup site and maintaining its readiness is still an expensive
proposition.
Crafting an agreement to share data centers with another
organization can be extremely inexpensive, but long-term operations
under such conditions are usually not possible, as the host's data
center must still maintain their normal production, making the
situation strained at best.
In the end, the selection of a backup site is a compromise between
cost and your organization's need for the continuation of
production.
Your disaster recovery plan must include methods of procuring the
necessary hardware and software for operations at the backup site. A
professionally-managed backup site may already have everything you
need (or you may need to arrange the procurement and delivery of
specialized materials the site does not have available); on the other
hand, a cold backup site means that a reliable source for every single
item must be identified. Often organizations work with manufacturers
to craft agreements for the speedy delivery of hardware and/or
software in the event of a disaster.
When a disaster is declared, it is necessary to notify your
off-site storage facility for two reasons:
| Tip |
---|
| In the event of a disaster, the last backups you have from your
old data center are vitally important. Consider having copies made
before anything else is done, with the originals going back off-site
as soon as possible. |
A data center is not of much use if it is totally disconnected
from the rest of the organization that it serves. Depending on the
disaster recovery plan and the nature of the disaster itself, your
user community might be located miles away from the backup site. In
these cases, good connectivity is vital to restoring
production.
Another kind of connectivity to keep in mind is that of telephone
connectivity. You must ensure that there are sufficient telephone
lines available to handle all verbal communication with your users.
What might have been a simple shout over a cubicle wall may now entail
a long-distance telephone conversation; so plan on more telephone
connectivity than might at first appear necessary.
The problem of staffing a backup site is multi-dimensional. One
aspect of the problem is determining the staffing required to run the
backup data center for as long as necessary. While a skeleton crew
may be able to keep things going for a short period of time, as the
disaster drags on more people will be required to maintain the effort
needed to run under the extraordinary circumstances surrounding a
disaster.
This includes ensuring that personnel have sufficient time off to
unwind and possibly travel back to their homes. If the disaster was
wide-ranging enough to affect peoples' homes and families, additional
time must be allotted to allow them to manage their own disaster
recovery. Temporary lodging near the backup site is necessary, along
with the transportation required to get people to and from the backup
site and their lodgings.
Often a disaster recovery plan includes on-site representative
staff from all parts of the organization's user community. This
depends on the ability of your organization to operate with a remote
data center. If user representatives must work at the backup site,
similar accommodations must be made available for them, as well.
Eventually, all disasters end. The disaster recovery plan must
address this phase as well. The new data center must be outfitted
with all the necessary hardware and software; while this phase often
does not have the time-critical nature of the preparations made when
the disaster was initially declared, backup sites cost money every day
they are in use, so economic concerns dictate that the switchover take
place as quickly as possible.
The last backups from the backup site must be made and delivered
to the new data center. After they are restored onto the new
hardware, production can be switched over to the new data
center.
At this point the backup data center can be decommissioned, with
the disposition of all temporary hardware dictated by the final
section of the plan. Finally, a review of the plan's effectiveness is
held, with any changes recommended by the reviewing committee
integrated into an updated version of the plan.