A disaster recovery plan is one of the most important elements of a well-run data center. Sometimes, however, important areas can go unaddressed—not intentionally, but because of other priorities, or the “set it and forget it” mindset to name a few. In order to take all possible precautions, consider these often-overlooked items for disaster recovery planning.
#1 – Not Thinking of the “Common” Disasters
More often than not, it’s the more typical incidents, like a failed power supply or a software crash, that occur and cause havoc rather than a natural disaster such as an earthquake or a hazardous material spill. These smaller-scale but no less important crises should have their own “mini-plan” to ensure steps are in place to address, if and when they occur. Otherwise you’ll be making up a plan as you go during a disaster situation.
#2 – Your Technologies Have Cobwebs
Well, hopefully you do not have literal cobwebs. It may be good fortune that you’ve never had to dust off the disaster recovery plan from its spot on the back shelf, but there is a downslide to this too: The older the plan, the more likely it will include obsolete technology and outdated tactics. This could result in a major problem should a disaster occur. Review your current disaster recovery plan and conduct a technology audit; determine if alternatives for “old” technologies are required, and if so, what needs to be replaced. The extra effort could save time and money, and most importantly, stave off a panic situation during a disaster and subsequent recovery time.
#3 – The Plan Has Never Been Practiced
You may have an airtight disaster recovery plan on paper, but has it ever been put to the test? Doing a practice run is the best bet against failure. Test the realities of the actual conditions and situation of your data center. Since business processes, IT operations, and personnel do change all the time, your plan must reflect these shifts. Also, your test run will accomplish several things: You will learn information that could change the plan (i.e., data have been moved from one server rack to another, for example), and also provides an opportunity to iterate and tighten up the plan. This testing will help all participants to fully understand the list of recovery priorities for services and data.
#4 – Know Your Service Level Agreements (SLA)
Another area that can fall between the cracks is details of your SLA and/or support contract with your vendor. Since there are different types of response levels depending on the agreement, you must fully read and understand what your service contracts encompass so you can plan for it. For instance, your data center may have what you consider to be an “emergency”, but your contract may specify different criteria that qualifies with your third party vendors and service providers. Conducing this review may also potentially mean you must upgrade or downgrade service levels in your contract.
#5 – What About Software Licensing?
Another commonly missed component to disaster recovery planning is software licensing. For instance, many software vendors allow for site licensing that is not active (“passive” licensing), but depending on how the software is set up and deployed, it could become unusable after the disaster recovery backup is initiated. This can lead to huge recovering back up issues and delays, including limited storage and processing resources. Don’t let this happen and stall the process of getting back to business.
#6 – It’s Not Just the Plan, It’s the People
Sure, you have a communications plan, but what about the people actually doing the work? It’s critical to compile a complete list of responsibilities and tasks, with detailed information on task areas and everyone’s role within the plan. Components would likely include accountability, informing, consulting, and other task development and execution steps. For example, who will be responsible for informing clients and customers of the down-time period? What specific actions will they take? A detailed people plan ensures that escalation and authorization processes are followed in order to keep systems running as smoothly as possible in disaster and recovery modes.