DEV Community

Cover image for IT Disaster Recovery Plan Steps
Alec
Alec

Posted on • Edited on

IT Disaster Recovery Plan Steps

What do you do when the server crashes? Or data is corrupted? Some technology problem has taken your entire company offline?

Whether you are a system administrator, Chief Technology Officer, or software developer... you hope this will never happen. But if it does, you better be ready. Make a mistake during an emergency and the disaster can be 100 times worse.

There are nine steps I have used which work every time, regardless of what kind of IT disaster has happened. Stop it from getting worse and how to handle it in the most effective and least stressful way.

What qualifies as a "disaster"?

  • Anything that takes your company offline so they cannot work
  • Your company website or app goes offline so customers cannot access it
  • Anything that corrupts data
  • Security breach and sensitive data stolen

The first thing to realize is there are multiple possible causes for any perceived disaster. These steps work regardless of what type of disaster and regardless of the cause of the disaster.

For example, if your company website goes offline that could be caused by such a wide variety of reasons and at first you have no idea how big or trivial this is as a "disaster". Let's say you are working remotely and the director of operations calls you from the Ohio office and says the website is down. No employees in Ohio can access the website and the director is starting to panic.

Here's a list a few possible causes from trivial to true disaster for this example.

Possible Causes for Web Server Offline

  1. really only the Ohio office has a problem with router or internet access; the rest of world accesses website fine
  2. network node in Ohio was cut and traffic using that network lost access; DNS needs to reroute; contact your data center and let them know the details
  3. a renegade admin changed everyone's passwords without notifying management
  4. the datacenter performed scheduled maintenance and their backup battery failed
  5. some developer uploaded a core file or database configuration file and brought down the website
  6. server software failure: apache kernel crash, SQL server crash, etc.
  7. DDOS attack on your server
  8. hackers infiltrated your server and encrypted hard drive for ransom
  9. hard drive is full
  10. hardware failure at the data center; could be firewall, router, network card, one of the servers, etc.
  11. hard drive failure and corruption
  12. database corruption
  13. and many other possible problems

When the Director of Operations calls and says there is a problem, they have no idea how trivial or terrible the problem is. It's your job to keep a level head and "fix it".

Watch video for the 9 Steps to handle any disaster.

There is no doubt a disaster is a horrible experience. But if you handle it well you will be known as a hero later.

Top comments (0)