Check Out Voice Report's Webstore

REGISTER HERE to access
FREE special reports,
exclusive discount offers,
podcast updates, and more...

 
Cover Image
The Independent Source of News and Best Practices for Communications Technology Professionals


Get your fix of telecom analysis and attitude with this bi-weekly podcast featuring some of the most dynamic personalities in telecom today.
Telecom Junkies
HOME > Special Content > #1 Article on Voice Report's Top 10 List

#1 Article on Voice Report's Top 10 List

SONET Ring is Culprit
in Air Tower’s 2-Hr. Blackout

Published Oct. 11, 2007 (Vol. 28, No. 20) 

Nearly two hours without phone service ... 200 planes stranded in the air ... More than 100 flights grounded, delayed or diverted in a dozen cities ...

Before the telephone, radio and radar blackout at the Federal Aviation Authority’s Memphis air traffic control center on
Sept. 25, the FAA and just about any communications technology professional who counts on uninterrupted telecom service would have confidently told you the same thing: A SONET ring is a solid, dependable way of making sure your network maintains redundancy in a disaster.

Now, more than a few telecom managers aren’t so sure.

AT&T officials have thus far been cryptic about what happened in Memphis late last month. E-mailed statements to Voice Report confirm only that an OC-192 network circuit – equivalent to 5,376 DS-1s – was indeed down between 11:38 a.m. and 1:30 p.m. central time, and “equipment failure” was to blame.

But Voice Report interviews with officials at the FAA and the air traffic control union, as well as several telecom experts, point to a bigger concern: The failure occurred on a SONET ring, a service purchased by enterprises for the very purpose of avoiding such critical outages, and was compounded by provisioning mistakes. Read on to determine the steps you need to take to make sure your enterprise doesn’t confront a similar disaster.

Faulty MUX Card Blamed for Initial Failure...

It makes sense that the FAA would purchase a SONET ring from BellSouth – now part of AT&T – for its Memphis air traffic control center.

The tower handled almost 3 million flights last year, ranking it ninth among the 20 centers in the continental U.S. responsible for communicating with planes above 18,000 feet. Several thousand pilots flying over portions of seven states rely on the center daily to keep them from crashing into each other. [See chart demonstrating configuration of BellSouth SONET ring.]

So, when telephone, radio and three out of the Memphis center’s 13 radar stations went dark Sept. 25, it wasn’t just all flights within 250 miles of Memphis that were shut down. At the Dallas-Fort Worth International Airport, for example, 50 flights reportedly were delayed.

In a desperate attempt to ensure the safety of planes already in the air, controllers at the Memphis facility breached FAA policy by using personal cell phones to communicate flight paths and radio frequencies to other centers, recounts Doug Church, spokesman for the National Air Traffic Controllers Association (NATCA).

“It kind of goes without saying that controllers are going to do whatever they have to do to protect safety,” Church says. “They’ll move heaven and hell to do it [even] if it means breaking a policy.”

A communication blackout of unprecedented proportions is how Patrick Forrey, NATCA’s president, described the incident during a hearing by the U.S. House of Representatives’ Subcommittee on Aviation – coincidentally scheduled the next day to discuss the recent epidemic of flight delays. “We have never had an outage involving this much airspace for this long a period of time,” Forrey told the subcommittee, reveal transcripts of the hearing.

Earlier during the hearing, Rep. Steve Cohen (D-Tenn.) had surprised Acting FAA Administrator Robert Sturgell with questions about the outage. “Does this incident, Mr. Administrator, indicate to you that there is a need for more backup systems or more security?” Cohen asked. “This was not a security problem, but do we have security at the telephone facilities that, if they were struck, could destroy our capacity to have an air transportation system?”

Sturgell’s answer: “We are still investigating, but at this point it is a Bell South/AT&T problem, and of course we will be, you know, discussing this with them, as we have been since it occurred, to figure out what the problem was and whether our system should be routed differently at this location and at other places to ensure more redundancy or better reliability.”

FAA spokeswoman Laura Brown, reached at home while walking her dogs, gave Voice Report more details a few days later. It was the removal of a “corrupted card,” apparently referring to a MUX interface card, that caused the network to fail, Brown says. “We have backups and backups and backups and backups,” she says, describing the multiple connections the air traffic control center has to the ring. “We are dependent on the redundancies that are supposed to be inherent in a SONET ring.”

Because SONET rings are made of fiber optic lines with more bandwidth than any one customer could use individually, multiplexers (MUXes) are needed to break the bandwidth down into channels that go to enterprises, explains Gary Audin, a telecom consultant who has provided carriers with SONET ring management advice. A MUX will have one interface card connecting to the SONET ring on the “back side” and several interface cards connecting enterprises to the ring on the “front side,” says Audin, president of Delphi Inc., in Arlington, Va.

Chris Lee, managing director for Fairfax, Va.-based Source Loop and a former MCI network engineer, says roughly 10 card failures occur yearly across the various networks he helps his more than 50 clients manage. Power surges, manufacturer defects and normal wear and tear are all typical causes for card failures, Audin adds.

...But Bad Provisioning Also Likely at Fault

But the failure of a single MUX card still doesn’t explain why communication was not restored on the SONET ring’s backup channel. That’s where human error played a major role, suggests Ron Carpenter, president of the NATCA’s Memphis branch.

Carpenter says AT&T has been less than forthcoming with efforts by air traffic control officials to investigate the outage, describing the circumstances that led to the Sept. 25 blackout as “proprietary.” But in the carrier’s silence, Carpenter says he and others have determined that Melbourne, Fla.-based Harris Corp., acting as the FAA’s provisioning agent, ordered the backup channel to run on the same line as the primary channel, leaving the network without a failover when the interface card was removed.

A Harris Corp. spokesman, contacted earlier this week, said he was unable to respond in time to Voice Report’s inquiries. The publicly traded systems integrator boasts on its Web site of signing a 15-year, $3.5 billion contract with the FAA in 2002 to modernize telecom infrastructure at 5,000 FAA facilities.

It wouldn’t be the first time such a mistake was made, says telecom consultant Gary Audin, president of Delphi Inc. in Arlington, Va. A Wall Street client of Audin’s, located in the World Trade Center, found itself in just such a position years ago, he recounts. The enterprise used its SONET for three years without a hiccup, but found out during a primary channel outage that the backup channel wasn’t wired correctly, though the installation tech had signed off on it.

It’s also possible that a MUX on the SONET ring used a dual-interface card, which connects to both the primary and backup channels, Audin says. Pulling the dual-interface card would have disrupted connectivity on both channels, defeating the redundant purpose of a SONET. Dual-interface cards are used to save space and money, he notes.

Are other enterprises in danger of sharing the Memphis air traffic control center’s recent experience?

AT&T says it has “initiated a comprehensive investigation” and “worked directly with its equipment vendor to evaluate similar equipment platforms throughout the AT&T network to ensure the highest levels of reliability, and to develop a software update for the equipment.” Following testing, the carrier says it will install its update “in relevant platforms” throughout the network. “These actions will help to ensure that a similar equipment failure does not occur in the future,” AT&T says. (

7 Tips to Ensure SONET Works
as Expected

Published Oct. 11, 2007 (Vol. 28, No. 20) 

Enterprises spooked by the nearly two-hour communication blackout at a Memphis air traffic control center might consider deploying satellite phones or even adding a third carrier to their SONET rings for extra redundancy, suggests telecom consultant Gary Audin, president of Delphi Inc. in Arlington, Va.

Louis Armstrong New Orleans International Airport installed laptop-sized Broadband Global Area Network (BGAN) satellite modems after Hurricane Katrina to complement the landlines in place at the enterprise, reports John Lyon, the airport’s telecom manager [VR 07/25/06]. The satellite data service could be used in conjunction with a VoIP provider to maintain voice services when the landlines go down, he explains.

But note that satellite backup isn’t cheap. Lyon estimates his modems cost between $2,500 and $3,000 each.

Adding a second or third carrier isn’t cheap either, Audin warns. In fact, you could more than double your costs, since you likely will be one of the carrier’s first customers in the area and be forced to incur its significant capital expenses. Another carrier wouldn’t have helped the Memphis air traffic control center anyway, Lee points out. Sprint and AT&T were both providers on the ring, but the corrupted card likely was pulled from the local legacy BellSouth gear, which connects the Memphis facility to both the AT&T and Sprint POPs, he speculates.

A few more tips to make sure your SONET rings and other communications contingency services will do the job when needed:

þ Run traffic on the SONET backup channel for 24 hours once a month, Audin advises the Memphis air traffic control facility and other enterprises for which communications are of the utmost importance. Communications technology pros often hesitate to test backup methods for fear that operations will be disrupted, but testing a “fault-tolerant” SONET with little to no failover time should have minimal impact, he says.

Add a clause demanding such testing under the “disaster recovery” section of your contract, Audin recommends. Should a failure occur during testing, the carrier should be instructed to revert to the original configuration.

þ Get engineering spec updates from carrier every year, recommends Robert Harris, IT director for the Baton Rouge-based Louisiana Lottery Corp.

In 1992, Harris’ enterprise thought it had an AT&T T-1 heading west out of the building, a Sprint T-1 heading east and an MCI T-1 heading north. But after a single cut cable resulted in losing all three T-1s, Harris says he learned that all three lines belonged to one carrier, which resold them to other carriers. The lottery figured it was better off buying all three T-1s from AT&T. But Harris held AT&T to a contractual guarantee that the T-1 routes would be diverse and he would have access to the plans to prove it.

Anticipate carrier push-back when requesting such design layout records, Source Loop’s Lee warns. “They’re very cagey about how circuits are routed,” he says. “They claim after 9/11 that we can’t tell you because it might fall into the wrong hands.”

If you are successful in convincing the carrier to show you the specs, expect your request for records to delay your contract for a couple months, as the carrier surveys the layout for you, he says.

þ When examining the layout of your lines, note that the channels of a SONET ring should be deployed at least 50 feet apart and come up through two separate manholes, Lee adds. You risk losing both in a single backhoe incident if your carrier deploys the channels too close together, he warns.

þ Stipulate that your SONET provider have spare cards on hand, Lee suggests. He recalls the hours it took a CLEC’s techs to drive to New Jersey to get a replacement card for an enterprise’s SONET ring in Washington, D.C.

þ Institute a “fail-soft” policy for events when only some of your communications are disrupted, Audin adds. Such a policy could have instructed air traffic controllers to use cell phones in an emergency, instead of banning them in all circumstances as the FAA’s policy reportedly does.

þ Consider installing a microwave radio system between buildings in line of sight from each other, Louisiana Lottery’s Harris suggests.

You’ll need to mount at least two directional antennas and connect them with cables to a radio the size of a stereo tuner inside your building, explains wireless expert Michael Finneran, president of dBrn Associates, in Hewlett Neck, N.Y. The lower the frequency, the farther apart your antennas can be. Finneran reports of a two gigahertz radio system with antennas 30 miles apart.

Expect pricing to start from $10,000, depending on the number of channels and drops, he says. (

Subscribe to Voice Report
Author/ Contact Information
Voice Report Customer Service
Toll Free (888) 275-2264 x3

Sign up for Between Lines, a FREE monthly ezine for telecom auditing professionals!