Case Study: Peering Through The Network Cloud

Using the image of a cloud to depict the Internet is a convenient metaphor. Once the user connects to the ISP, he’s not concerned about how or why the messages arrive where they should. It’s as easy as flicking a light switch and being certain that somehow all those generators and transmissions lines will manage to bring 120 volts to your fingertips.

At least that’s how it is supposed to be. At times the Internet has all the reliability of a Third-World power grid. Not so much of a problem if you are only waiting for football scores, but no way to run a business.

“If our overseas sales sites can’t connect to us, they can’t place orders,” says Bill Miller, senior network engineer for Sappi Fine Paper North America in South Portland, Maine.

This is where the cloud-like amorphous form of the Internet can prove a hindrance. Slow connections and dropped packets cost a company real money in terms of both lost productivity and missed sales. But pursuing a remedy often is like driving in London’s “pea soup” fog. With the proper tools, however, you can peer through the cloud to find clear skies.

“Once we had graphs showing high latency and packet losses,” Miller continues, “we were able to get our WAN contractor motivated to make it right and they did.”

Independence

Sappi Fine Paper North America (Sappi) is a division of Sappi, Ltd. of Johannesburg, South Africa, whose 19 mills in Africa, Europe and North America produce more than $4 billion in paper products annually.

Miller oversees Sappi’s 3000-node WAN from the corporate data center in South Portland, Maine. The WAN connects that data center to the headquarters in Boston, as well as four paper plants, seven warehouses, 10 sales offices and a research and development facility. In addition, it links to overseas facilities in Europe, Mexico and Brazil.

Hardware includes an IBM 3090-compatible mainframe, 150 servers running AIX, HPUX and Windows, and Windows desktops (9x, NT, 2000). The company also contracts out for supplementary mainframe capacity.

Sappi outsourced the monitoring and repair of its WAN, but didn’t want to rely solely on the contractor informing them of problems.

“We wanted something that would independently notify us of a problem on the network,” says Miller. “We also wanted historical information such as frame relay uptime and trending of problems such as frames being discarded or even frames being marked as eligible for discard.”

Miller implemented WebNM by Somix Technologies, Inc. of Portland, Maine. He found it to be easier to use and far cheaper than other network management tools he’d used such as Aprisma’s Spectrum.

As its name implies, WebNM is a web-based network management package which monitors and manages any Simple Network Management Protocol (SNMP) devices. It also performs software and hardware inventories, enables remote management of desktops and automates trouble ticketing along with several other functions.

Since it is built mainly of open-source components, companies pay a flat rate regardless the size of the operation — approximately $30,000 including installation and support.
Miller reports that it took a vendor engineer about an hour to install the software on a Dell server with dual 1.2 GHz processors running Windows 2000. Another day was spent modeling the network and training staff on how to use it.

The core functions of the program utilize Ipswitch, Inc.’s (based in Lexington, Mass.) WhatsUp Gold to create a network map and provide alerts. Sappi specifically monitors server CPU utilization, memory, available disk space and traffic flow on the interfaces as well as such items as the fans and device temperature if the server has MIB (Management Information Base) extensions covering these items.

On the company’s Cisco routers, CPU utilization and memory are watched. All the SNMP information collected from network devices is stored in an SQL database, where it is used for graphing and reporting.

“We have one main graph which shows everything at a glance and the user can then drill down into the sub-interfaces,” says Miller. “With one click I can bring up the last five minutes, 30 minutes, two hours or the day, and look back over the past two years to spot any trends.”

Another module of WebNM named Logalot is used by Sappi to collect all Syslog and Windows Event Log entries from 300 network objects and store them in a single database. These entries are then managed according to policies established by the administrator, including notifying the appropriate personnel or resetting a device if needed.

Miller finds this far more convenient than having to go from log to log to view entries or receiving alerts from many different sources all about one small problem. Instead, alerting is consolidated and IT staff receive alerts via audible speaker, paging, cell phone and/or e-mail.

“We know right away if something really fails,” says Miller.

Using Your Own Eyes

But the primary purpose of the software purchase was to keep WAN connections operating optimally. As soon as Sappi technicians used their own eyes to view what was happening, they discovered all was not well.

“We were paying a certain amount of money for guaranteed throughput to overseas locations, but packets were being marked ‘discard eligible’ or were being dropped even when the traffic load was small,” says Miller. “This enabled us to force the WAN carrier to determine what the issue was with our overseas contractor.”

In addition, Sappi cut overall networking costs. Since the company could now view actual traffic levels on connections, it realized that certain links were over-provisioned. By reviewing the historical data of bandwidth usage, Sappi groomed down those circuits to the levels required, rather than paying for extra bandwidth that would never be utilized.

One further area may also result in cost savings. Sappi is investigating the possibility of bringing WAN management internally. Instead of continuing to outsource, Miller believes that WebNM may well be enough for the company to adequately manage the WAN without adding to the IT workload.

“We are moving into a situation now where to save money we are looking at taking over the whole thing ourselves,” says Miller. “The product is so good that we have developed the confidence in its ability to give us rapid notification of any faults.”