[BreachExchange] Disaster recovery testing: how to get it right

Thu May 16 20:26:00 EDT 2019

https://continuitycentral.com/index.php/news/technology/4017-disaster-recovery-testing-how-to-get-it-right

Having a solid disaster recovery (DR) strategy in place is imperative – but
if you don’t test it regularly, you still risk your business being hit hard
if ransomware strikes or if there is a system outage. The purpose of IT
disaster recovery testing is to pinpoint and fix any flaws in your DR plan
well before you find yourself in a real disaster scenario.

To do this, you need to thoroughly scrutinize how well your plan performs,
and allow enough time to resolve any issues before they impact the ability
to restore operations in case of an emergency. Scheduled and frequent
testing is the only way to be certain your organization can be back up and
running quickly following an outage.

To help ensure your testing efforts are effective, follow these five key
steps:

1. Make sure you choose technology that facilitates the all-important
testing. Modern disaster recovery systems, for example, take frequent
image-based backups and replicate server images to the cloud. When there is
a primary server outage, operations can be restored directly from a backup
instance of a virtual server. This so-called ‘instant recovery’ approach
has fundamentally changed how DR testing is performed as it allows users to
easily spin up virtual machines locally or in the cloud and test the
ability to restore essential services such as email and database
applications.

One word of caution: To avoid conducting ineffective tests, always refer to
the vendor’s guidance first. Many disaster recovery vendors provide a
pre-test checklist with specific tasks that must be performed prior to
testing and skipping those can create tests that yield inaccurate results –
invalidating the entire testing process.

2. Define the scope of testing. For example, should the test be conducted
in a cloud-based environment that mirrors the production environment, or is
the scope broader? Some tests might even go beyond IT – such as testing an
emergency generator.

There is no single ‘right’ approach; every organization will have to
determine its own specific needs based on how much disruption it can
tolerate during testing, and the amount of time and resources it can
dedicate. However, cutting corners or running incomplete tests is not
advisable as potential issues may be missed that will impact restores
later. While defining the test scope, it’s also important to remember that
some of the more radical test methods carry a risk of data corruption or
even data loss.

3. When it comes to the frequency of DR tests, again, there is no silver
bullet. While it should be considered essential to perform a test every
time there has been a significant change to the production environment,
routine tests may take place quarterly or every six months depending on the
available resources. Again, it’s also a matter of weighing up risks – some
organizations might require more frequent testing.

4. Reporting and sharing the results of these tests demonstrates the value
of the DR strategy to the management board and other stakeholders. This
might be as part of a formal review meeting or a more informal email
report, but as a minimum, it should include details of the test results and
proof that any issues have been resolved, as well as confirming the ability
to recover along with the on-going validity of the DR strategy. Live
testing, on the other hand, is not recommended, as depending on the
outcome, this can actually decrease confidence in the DR plan.

5. Something that is all too easy to neglect is the comprehensive
documentation of network topology, DR plans, testing processes and test
results. However, documenting everything is important, and there are many
tools on the market to help with this, ranging from fairly basic to highly
comprehensive. The information captured should go beyond IT components and
also include contact lists for support teams, technology vendors, and any
other pertinent information that might be needed following a disaster event.

Ransomware, user error, natural disasters: all of these are very real
threats. Those businesses that can restore operations in the shortest
timeframe will have a competitive edge. No plan and no system is ever
failsafe, but by carefully performing regular DR tests, immediately dealing
with any issues that are identified, and meticulously noting down all
relevant information related to the DR plan, you should be in the best
position possible to cope with all eventualities.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.riskbasedsecurity.com/pipermail/breachexchange/attachments/20190516/2a33b399/attachment.html>