”When I came to the job, it was clear the network management had been sidelined for a few months and changes were urgently needed. But they wouldn’t let me touch them until after the Christmas rush.”

The speaker was a young technician recently appointed to head a major retailer’s network team, but his story did not surprise me. In my work I come across a growing number of retailers worried about the risk of system outages during the hectic Christmas to January sales period and some, as in this case, have simply banned system changes for several weeks in advance, and found that it makes a big difference.

More generally, we find that three quarters of all system outages stem directly from human error. Organisations queried, typically state that anything from 60 to 80% of their problems eventually trace back to administrative changes in the network – either plainly wrong, or made without a full understanding of the knock on effect or possible long term consequences. Forbid changes, and you achieve stable performance – but what a way to do business!

Why so much human error? and why is it on the increase? Let us look at the way the burden of network administration has been building up recently, and what can be done to solve this problem – without freezing important system updates.

Core network services

The fact that networks are growing in size and complexity is obvious and unremarkable – as organisations expand so must their systems, reach out to serve more staff and more sites. This is, hopefilly, all allowed for in the company’s IT strategy and budgets. But overlaying this evolutionary expansion are other less obvious pressures on the infrastructure.

If, for example, I want a VoIP phone on my desk as well as on my PC, I simply plug it into the existing network. But if everyone does that, it doubles the number of IP addresses. If I also want a smartphone, a laptop and an iPad for business use, I now have five IP addresses where a couple of years ago I only had one.

As Enterprises become more distributed, we move from an intranet of computers to an intranet of people who move between locations, and each mobile device may call up further addresses as it hops from location to location. These ”virtual workplaces” embrace branch offices, home offices, hotels, and airports.

It is not only the surge in individual’s addresses, but also other systems such as building control systems, surveillance and secure entry systems, vending machines, fire detectors and other devices that used to be manually controlled, but now form part of the corporate network and are often involved in automated, machine to machine communications.

An interesting example comes from the Swedish truck company Scania: it has some half million vehicles on the road all around the world, and all the newer vehicles can be wireless-linked to keep in touch with the corporate HQ, so that each vehicle’s position, performance and service data is available in real time.

All these addresses building up on the network brings to mind the social weaver birds of Southern Africa, adding more and more ”addresses” (ie nests) to the telegraph pole until it collapses and destroys the communications system. What collapses in this case is not the network itself, of course, but its human support structure, the network management.

The Social Weaver Bird in South Africa’s Kalahari desert, weaves its nest out of dry grass with an opening at the bottom. Massive canopies build up in a tree or a telegraph pole, as ever more nests are added. Eventually the weight becomes too much and the tree or pole will collapse, trapping the birds inside their nests and keeping the population in check. It is not unlike the problem of network management, trying to maintain core network services as the weight of ever greater numbers of IP addresses builds up in the organization.

The extraordinary thing is that, while so many maintenance, security and communications functions in an organisation have become automated via the network, the network’s own core services remain a last bastion of manual labour. Although top Enterprises like Scania largely use automation, in the majority of companies those IP addresses, their naming (DNS) and management (IPAM) functions are still performed manually using spreadsheets.

Whereas automated virtual server management takes less than a minute to provision a new server, setting up a new server’s IP address manually could take thirty to forty minutes and the technician callout could take days to arrange. It is not just the time, it’s the likelihood of error. 60 to 80% of network outages eventually trace back to human error caused by administrative changes. Such minor slips can take months to reveal themselves and be very hard to trace back. As the workload pressure mounts, so does the likelihood of such errors.

Today’s networks are not only complex, they can also be increasingly opaque. Network managers often do not know what is happening in the network, other than the results, and virtualization is adding to the problem by creating two distinct layer 2 networks, the physical and the virtual. Virtual networks are nearly invisible to traditional management tools and need proprietary solutions such as: VMWare VirtualCenter.

The rule of thumb used to be that, if your IP addresses are numbered in hundreds, rather than thousands, you may not need to automate. But, as we have seen, the explosion in addresses means that many quite small organisations are now facing the burden. Automating the network infrastructure is no longer just for global corporations, it is becoming a vital necessity for any medium to large company.

The need for automation

Many organisations deploy DNS and DHCP servers with little or no central management, using Microsoft’s Active Directory or BIND (dating from the 1980s). These tools really only serve a single domain with a few hundred addresses. In many organisations IP address management is no more than a paper record or a spreadsheet. The habit begins when there are only a few dozen static hosts, but the system falls apart as the number of addresses soars and becomes increasingly dynamic with all the mobile users.

Rather than rely on free tools, the answer is to automate these core functions using the latest network service management technology. The new devices, rather than just serving a single IP address to a host, allow the entire IP address space of an organisation to be centrally managed in terms of network resource allocation – giving IT managers and CIOs a whole different perspective.

Address management is no longer a tedious housekeeping chore, but rather a powerful tool for security, capacity planning, availability and growth management. Automation does not remove the human element, it helps people apply policies in a consistent, documented manner across the whole network.

Good automation, addresses not only the underlying technology, but also the people, and the business processes. This is the key to a scalable, smooth operation, and an important enabler for virtualisation. It helps a company to globalise and manage the complex address space making applications available to increasingly scattered offices and mobile users, even across multiple carrier networks.

When the IT manager is asked to deploy many extra servers for some new application, there will be no need for emergency re-building or re-partitioning of the network. IP address registration and allocation can be planned in advance, not under battle conditions.

Automation in practice

Scania is an example of a major global company that sees the wisdom of automating core network services. Scania’s vehicles are sold and supported in over 100 countries, and over 47,000 vehicles were produced in 2009 by some 32,000 employees, of whom 10,000 are directly involved in production.

The company’s global IT network covers everything from IP-enabled vehicles on the road, to state of the art manufacturing facilities in seven countries across the globe and more than 1600 sales points. It adds up to a diverse, complex and ever-growing system in which network management plays a vital role in ensuring rock-solid, core network services.

These services include domain name resolution, and IP address assignment and management and, if those core network services fail, then the whole network fails, the manufacturing business would grind to a halt, and the speed and efficiency of the network of trucks on the road would be severely compromised. Like most organizations, they relied on traditional DNS with BIND and Microsoft DNS, but maintaining the infrastructure of the spreading network became increasingly difficult.

Automating these services does not necessarily involve more software complexity, it can be achieved by standardising on a single platform. So, you can have a happy Christmas!