For any Web-based business, the IT network is a vital asset with a simple function – to move data from a transmitting device to a receiving device. Maintaining maximum uptime and availability of the network should be the highest priority of any IT administrator in order to keep business operations and processes running smoothly 24/7. However, if something disrupts the communication between devices on the network, it can have damaging effects on both productivity and profitability.

Identifying a problem

Some of the most common problems on a network include:

  • Damaged or unplugged cables
  • Damaged or broken hardware device
  • Insufficient coverage, unplugged antennas and interfering transmissions, for example those from trains, or wireless networks
  • Network traffic congestions and packet loss
  • CPU overload or insufficient disk space
  • Hardware or software malfunctions
  • Insufficient power supply
  • Connectivity problems
  • A high network collision rate
  • Intrusions, e.g. malware attacks

It is relatively easy to determine if a problem exists as it will usually cause an immediate and sudden change in network traffic patterns. When this happens, you will need to ask yourself the following to assess the complexity of the problem:

  • Is the change expected?
  • Is it a recurring event?
  • Does the change involve a device or network path?
  • Does the change interfere with vital network operations?
  • Does the change affect one or many devices or network paths?

Who to tell when there is a problem

Knowing whom to tell what information, and when to tell it, is crucial when you encounter a problem with your network.

Employees/Colleagues – When there is a problem with your website, you can be sure that your employees and colleagues will hear about it from complaining customers. They need the latest information about a problem, such as an inaccessible or slowly loading Web page, to be best equipped to handle customer queries.

Customers – Honesty (and openness) is always the best policy. Knowing that problems are detected and fixed (sometimes even before they happen) gives your customers the confidence that they can trust your business to provide them with what they need on every visit.

Service Providers – Have you ever tried to return an item to a store without a receipt? Without proof, you usually will not get very far. Service providers are similar. With the thousands of customers they service each day, their support staff may require proof of any problems you have with your service. Providing a detailed analysis to your provider allows them to provide you with an efficient resolution to your issues.

Preparing for that moment of disaster

Planning ahead for network disaster will ensure minimum disruption to your business. There are a number of other measures that you can take:

  • Implement Automatic Actions: “Self-healing”

One of the basic preparations for emergencies is to have automatic self-repairing actions in place. For example in most cases, configuring a server to reboot automatically is the fastest way to get it back online.

  • Implement Notifications: “Alarms”

Implementing instant notifications streamlines the process in such a way that, the moment there is something wrong with your server, the people needed to help resolve the problem receive an email, SMS, or an instant message, informing them about it. The relevant people can then take the necessary measures to ensure the issue is resolved.

  • Prepare and Test Disaster Recovery Plans

Simply preparing contingency plans for emergencies is half the battle; you also need to test their effectiveness. For example, if your plan includes moving customer traffic to a backup server, you need to test whether it will be able to handle the extra load.

  • Consider Load Balancing and Hot Standby Redundancy for Mission-Critical Systems

Having a stand-by for mission critical systems is very important. In the case of an emergency, such as a server crash, you can simply redirect your traffic to the stand-by system. For example, my company runs a full, nightly updated copy of our main Web site (already running on a load balanced dual server setup located in the U.S.) on a second, dedicated server 24/7 (located in Europe). If there is a problem with the first server, we simply change the DNS entry to move all traffic to the backup system. If you require even higher availability or if your Web site is transaction-based (such as an auction Web site), using load balancers to automatically move traffic to another machine in case of failure is the right way to go.

Any effective disaster contingency planning, however, starts with ongoing network monitoring. This allows IT administrators to accurately recognise and diagnose network disruptions and identify those that are out of the norm. This will ensure that the network, and the business, is running smoothly and, at best, will detect any network problems, failures, and performance issues before they have a chance to affect employees or customers.