news

FCC Investigating Widespread CenturyLink Outage

Reader comments

CenturyLink says it’s in contact with the Federal Communications Commission (FCC) and will cooperate with its investigation into the telco’s nationwide outage that affected 911 service for numerous consumers across the country.

CenturyLink's Linda Johnson

CenturyLink’s Linda Johnson

The lengthy outage, which also affected some Verizon customers, occurred on Dec. 27. Restoration of services began that day and network traffic had normalized as of Dec. 29, said CenturyLink spokeswoman Linda Johnson.

FCC chairman Ajit Pai said he spoke with CenturyLink to “underscore the urgency of restoring service immediately.” If the Trump administration’s government shutdown continues, the FCC might shut down most of its operations Thursday.

“When an emergency strikes, it’s critical that Americans are able to use 911 to reach those who can help,” he said. “The CenturyLink service outage is therefore completely unacceptable, and its breadth and duration are particularly troubling. I’ve directed the {FCC’s] Public Safety and Homeland Security bureau to immediately launch an investigation into the cause and impact of this outage. This inquiry will include an examination of the effect that CenturyLink’s outage appears to have had on other providers’ 911 services.”

For a full recap, read our timeline of the CenturyLink outage, from the moment it was reported to the FCC investigation.

The outage impacted voice, IP and transport services, and CenturyLink‘s visibility into its network-management system, impairing its ability to troubleshoot and prolonging the duration of the outage, Johnson said.

“The outage was caused by a faulty network management card from a third-party equipment vendor that caused invalid traffic replication,” she said. “Steps are being taken to help prevent the issue from reoccurring.”

CenturyLink has established a network-monitoring plan for key parameters that can cause this type of outage, based on advice from the third-party equipment vendor, Johnson said. Enhanced visibility processes will “quickly identify and terminate invalid packets from propagating the network,” she said.

“This will be jointly and regularly evaluated by the third-party equipment vendor in conjunction with CenturyLink network engineering to ensure the health of the affected nodes,” Johnson said.


5 comments

  1. Avatar Alan McLean January 2, 2019 @ 5:13 pm

    A management network should ride on top of and rarely interact with bulk data transport. Just often enough to monitor node data flow and pick up traffic statistics. Management protocols are by default high priority service but with small bandwidth needs. Traffic is defined by particular frame, packet or port types. Multiple steps can and should be taken on any major back bone link to restrict management packets to minimal backbone capacity. Yet one single card replicating management traffic takes out a nationwide network for 2 days including VoIP and 911 service? Not buying it Centurylink, but if it’s true you just provided a great reason to consider SD-WAN service.

  2. Avatar Jon Wolf January 9, 2019 @ 1:20 pm

    RespOrg.com offers a toll-free disaster recovery product that allows for carrier redundancy in front of the carrier wall. Having control over your businesses toll-free numbers, and ensuring their uptime, is paramount to a businesses ability to provide customer support, acquire new customers, and retain existing customers.

  3. Avatar Dawn Bozeman January 11, 2019 @ 12:14 pm

    It is a shame that so many customers were affected by the outage. We need to always remember that outages can be caused by many factors. Lost connections by cable cuts or other means are different from electronic component failures. Today, we see SD-WAN as an opportunity to re-route communications in the event of a circuit being degraded or lost. This is a great way to ensure connectivity to Cloud based applications, as you can have multiple routes in an SD-WAN design. However….it is imperative that the APPLICATION being connected to also has a type or redundancy, or at least monitoring system to allow a quick response should a component fail.

  4. Avatar Will Gifford January 15, 2019 @ 2:31 pm

    Thanks for the article Edward! Outages like this are scary for any business or organization because we never know when they might happen. teleira.com allows you to prepare for outages just like this one by backing up your communication systems via satellite and cloud. That way if something like this happens again, instead of waiting for your carrier to fix the problem you can reroute your communications through satellite in minutes and continue business as usual.

  5. Avatar Ben Stiegler January 16, 2019 @ 12:01 pm

    and the other article on this lays the blame at the feet of ‘a major US based equipment manufacturer whose products are embedded in our network.’ interesting. Some of us are old enough to remember when (pre-ATT-breakup), a software bug in their Unix-based switching software took most of the US long distance infrastructure off the air for I think close to 24 hrs. A software condition caused each toll switch to ask its neighbors to please reboot itself. The network couldn’t stay up because the nodes were overwhelmed with reboot commands coming from trusted neighboring nodes. I call upon CL to be a LOT more transparent about exactly what happened, and why the detection and remediation took so long. And I hope the FCC does, too.

Leave a comment

Your email address will not be published. Required fields are marked *

The ID is: 114458