[BreachExchange] A disaster recovery plan: What is your IT team keeping from you?

Inga Goddijn inga at riskbasedsecurity.com
Wed Jun 29 22:49:54 EDT 2016


http://www.cloudcomputing-news.net/news/2016/jun/29/disaster-recovery-plan-what-your-it-team-keeping-you/

Your disaster recovery program is like a parachute - you don’t want to find
yourself in freefall before you discover it won’t open. But amid hastening
development cycles, and cost, resource and time pressures, many CIOs are
failing to adequately prioritise DR planning and testing.

While IT teams are running to stand still with day-to-day responsibilities,
DR efforts tend to be focused solely on infrastructure, hardware and
software, neglecting the people and processes needed to execute the plan.
At best, this runs the risk of failed recovery testing. At worst, a
business may be brought to its knees at a time of actual disaster without
any chance of a swift recovery.

Even if you passed your last DR test, it’s only a predictor of recovery,
not a guarantee

Your team may be reluctant to flag areas of concern, or admit that they
aren’t confident your DR plan will work in practice. Perhaps they’re
relying on the belief that “disaster” is a statistically unlikely freak of
nature (we all know hurricanes hardly ever happen in Hertford, Hereford and
Hampshire) rather than a mundane but eminently more probable hardware
failure or human error. It’s possible that at least one of these admissions
may be left unspoken in your own organisation:
*“We’re not confident of meeting our RTOs/RPOs”*

Even if you passed your last annual DR test, it’s only a predictor of
recovery, not a guarantee. Most testing takes place under managed
conditions and takes months to plan, whereas in real life, outages strike
without notice. Mission-critical applications have multiple dependencies
that change frequently, so without ongoing tests, a recovery plan that
worked only a few months ago might now fail to restore availability to a
critical business application.
*“Our DR plan only scratches the surface”*

Many organisations overlook the impact of disruption on staff and the
long-term availability of their data centres. How long you can support an
outage at your recovery centre – whether that’s days or weeks – will
determine your DR approach. Can you anticipate what you would do in a major
disaster if you lost power, buildings or communication links? What if you
can’t get the right people to the right places? How well is everyone
informed of procedures and chains of command? People and processes are as
relevant as technology when it comes to rigorous DR planning.
*“We know how to fail over… just not how to fail back”*

Failback – reinstating your production environment – can be the most
disruptive element of a DR execution, because most processes have to be
performed in reverse. Yet organisations often omit the process of testing
their capabilities to recover back to the primary environment. When push
comes to shove, failure to document and test this component of the DR plan
could force a business to rely on its secondary site for longer than
anticipated, adding significant costs and putting a strain on staff.
*“Our runbooks are a little dusty”*

How often do you evaluate and update your runbooks? Almost certainly not
frequently enough. They should contain all the information your team needs
to perform day-to-day operations and respond to emergency situations,
including resource information about your primary data centre and its
hardware and software, and step-by-step recovery procedures for operational
processes. If this “bible” isn’t kept up to date and thoroughly scrutinised
by key stakeholders, your recovery process is likely to stall, if not grind
to a halt.
*“Change management hasn’t changed”*

Change is a constant of today’s highly dynamic production environments, in
which applications can be deployed, storage provisioned and new systems set
up with unprecedented speed. But the ease and frequency with which these
changes are introduced means they’re not always reflected in your recovery
site. The deciding factor in a successful recovery is whether you’ve stayed
on top of formal day-to-day change management so that your secondary
environment is in perfect sync with your live production environment.
*“Our backup is one size fits all”*

In today’s increasingly complex IT environments, not all applications and
data are created equal. Many organisations default to backing up all their
systems and both transactional and supportive records en masse, using the
same method and frequency. Instead, applications and data should be
prioritised according to business value: this allows each tier to be backed
up on a different schedule to maximise efficiency and, during recovery,
ensures that the most critical applications are restored soonest.
*“Backing up isn’t moving us forward”*

Backups are not, in isolation, a complete DR solution, but data management
is a critical element of a successful recovery management plan. Whether
you’re replicating to disk, tape or a blend of both, shuttling data between
storage media is achingly slow. And if it takes forever to move and restore
data, then regular testing becomes even less appealing. But foregoing a
regular test restoration process simply because of time-to-restore concerns
is a recipe for data loss in the event of an outage.
*“We don’t have the bandwidth for testing”*

Testing recovery procedures of applications is a whole other ballgame than
recreating a data center from scratch. Trying to squeeze the whole exercise
into a 72-hour testing window won’t do – that’s just enough time to marshal
the right employees and ask them to participate in the test when it’s not
part of their core function. So, companies often end up winging it with
whatever resources they have on hand, rather than mapping out the people
they need to conduct and validate a truly indicative test.
*“We don’t want to do it…but we’re not keen on someone else doing it”*

Trying to persuade employees that an outsource option for recovery is in
their best interests can be like selling Christmas to turkeys.

Foregoing a regular test restoration process simply because of
time-to-restore concerns is a recipe for data loss

But in fact, partnering with a recovery service provider actively
complements in-house skills-sets by allowing your people to focus on
projects that move your business forward rather than operational tasks. It
is also proven to boost overall recoverability. Managed recovery doesn’t
have to be an all-or-nothing proposition, either, but a considered and
congruous division of responsibilities.

With always-on availability becoming a competitive differentiator, as well
as an operational must-have, you don’t have the luxury of trusting to luck
that your DR plans will truly hold up in the event of a disaster.

The first step to recovery starts with admitting you have a problem and
asking for help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.riskbasedsecurity.com/pipermail/breachexchange/attachments/20160629/9c22f804/attachment.html>


More information about the BreachExchange mailing list