Off the grid.....
by Chris Josephes
In the spirit of the new O'Reilly Emerging Telephony site, I thought I'd share one of my best (or worst) VoIP horror stories.
Before my current employ, I worked at a telecommunications company that wanted to make headway into the VoIP market. The company invested in new network gear, consultants, training, and infrastructure. Their goal was to be the best VoIP company in the whole state.
We filled the market with buzzwords like converged, and QoS. We held parties to show off our infrstructure, in the hopes of landing a nice big corporation, or maybe even a government contract. When the product rolled out, they had about ten nice sized customers. So far, the product rollout looked successful.
But all of that would change on one cold, normally uneventful February day. I was sitting at my desk working as always, when the network connection to my PC went down. I double checked the link light of my network card, but the problem was really confirmed when three or four co-workers asked aloud if there was a problem with the network.
The network support group tried to call the main office to see if they were having network problems. And since I said tried, you can guess where this tale is heading. No dialtone. No nothing. The Cisco 7940s sitting on everyone's desk suddenly became $800 paperweights.
We reached the other building by calling the cell phone of a co-worker with another cell phone. They were aware of the problem, and a network engineer was coming over to look into the situation. It was very likely the core router took a nosedive.
It took about another minute for those words to sink in. If the core router is down, would that mean that all of the other customers using our VoIP service are down? But, if they're down, why don't they call?
Because they can't.
If you've ever worked for a big ISP, you know that the phones will usually jam up whenever there's some kind of major outage. A large queue of holding calls is a pretty good indicator that there's a big network problem.
This outage was different. This outage was filled with nothing but an errie silence. The network support team had a major outage, but there was no way to guage how the customers were coping. We never even knew if they were aware of a problem.
The customer database was unreachable, so we couldn't proactively call. We had some contact numbers written on paper, so we reverted to cell phones. That worked, except for the customers that had unwittingly used their VoIP number as an emergency point of contact.
We were able to confirm the worst. Everything was down. Networks, phones. The new network infrastructure that was built from the ground up had died.
It was rumored that the head engineer was on site and working to fix the problem. He carried with him the tell-take laptop with a light blue RJ45 serial cable trailing behind him. It was also rumored that the Chief Operating Officer was right next to him, wanting desperately to be kept in the loop. They would both be staring at a 15 inch LCD screen with cell phones pressed up against their ears.
About four hours later, the problem was fixed. I'll admit I don't remember what the actual cause of the problem was, or the resolution. It could have been an IOS bug, a routing table that had gotten out of hand, or just a bad configuration upload. At the time, we were just happy to get the phones working.
The aftermath wasn't pretty. Some customers were completely unaware until we could finally call them. Miraculously, none of the customers left us. New promises and reassurances were made, along with the expectation that things would get better down the road.
After the incident was over, things around the company changed. New procedures went into place, policies would be updated, and the upper management would be keeping a watchful eye over everything that happens in the future.
I'm not writing this to disparage VoIP, VoIP product lines, vendors, or providers. I am only writing this because this was my most detailed memory of my brief encounter with a VoIP environment. I still think it's a great technology, but there needs to be a higher level of maturity and stability before I'm ready to adopt it for personal use.