Airline outages show need for backup plans


NOW NOT BOARDING

Airlines have been hit by periodic computer problems as they have grown increasingly reliant on technology to process passengers. Some recent examples:

United, July 2015: Failed router grounds all flights for about two hours, snarling operations for most of the day.

Southwest, July 2016: Faulty router leads to systemwide shutdown after backup systems don't kick in. Problem was fixed within 12 hours or so but it took much longer to reset operations.

Delta, August 2016: Hardware failure leads to power outage at Atlanta computer center. Not all servers have backup power, preventing smooth restart and crippling operations for three days.

British Airways, this week: Problems in a new check-in system led to outages that snarled flights for a day or so.

In the hours after the Delta Air Lines computer outage last month, the airline resorted to an archaic form of passenger processing: Writing out boarding passes by hand.

Even after systems came back up, some were still slow, and gate agents in some cases had to manually enter codes from boarding passes.

The episode resulted in a huge black eye for Delta, with more than 2,000 flight cancellations over several days affecting people all over the world. The manual processes slowed check-in and boarding, delaying flights that did operate.

The Delta outage also sent a warning to any company that’s heavily dependent on technology for daily operations: Have a Plan B ready if things go haywire.

Such companies include UPS, which also operates an airline in its massive shipping business and whose employees use hand-held readers to log package movements and deliveries.

It could also include hospitals, financial technology firms and a wide range of other businesses.

UPS, based in Sandy Springs, said it regularly reviews business continuity plans and conducts exercises.

“UPS has built redundancies throughout our global network to ensure continual operations for our customers,” the company said. UPS also said it has technical experts available “to respond rapidly and minimize any impacts on service to our customers.”

Yet Delta’s outage — one of several that have afflicted airlines — showed how challenging it is for a technology-dependent company to reach back to old practices that are unfamiliar to employees, time-consuming and not designed for the size or speed of the current operation.

“Some of the efficiencies gained by technology have made it a little more difficult for airlines to go more manual,” said Joe Leader, chief executive of the Airline Passenger Experience Association.

“With electronic flight bags and dispatch, operational control, routing for aircraft, boarding passes and other elements required for travel, there are so many variables in play that you can’t conduct airline operations without a baseline of technology.”

It also highlights the difficulties of coming up with backup procedures when computer systems are down, or at least procedures that allow for continued operations while complying with security and safety rules.

Paul Jacobson, Delta’s chief financial officer, said the airline is making investments over the next couple of years “to minimize the probability that that ever happens again.”

Manual code entry

Brian Myers, a traveler who lives in Atlanta and flew out on the first day of Delta’s outage, said boarding of his flight took nearly two hours because some passengers’ boarding passes wouldn’t scan correctly, and an agent had to manually enter a series of codes from a spiral notepad.

It was “at least a 10-minute process,” Myers said. “It’s crazy how dependent they are on all the electronics working.”

Even when handwritten boarding passes and other methods appear to be manual, there is often still a collection of digital data used to check passenger records and flight schedules, for example, according to Leader.

Delta spokesman Morgan Durrant said the airline has “manual airport procedures in the event of a rare loss of power or network connectivity at an airport gate to safely and securely keep aircraft, bags and customers moving.”

Each airline has an outage procedure approved by the Transportation Security Administration that covers validity of boarding passes, TSA spokesman Mark Howell said.

Delta also turned to some unconventional backup procedures, including using its Delta Private Jets subsidiary to fly about 40 passengers from Atlanta to their final destinations.

Those backups, to be sure, were a last resort when Delta’s entire computer system crashed after a power control module failed at Delta’s technology command center on Aug. 7. The company discovered that about 300 servers were not connected to backup power, which meant the system could not be smoothly restarted.

Companies all over that are dependent on technology took note, said Tony Cooper, spokesman for the Technology Association of Georgia.

“I think that everyone’s taking notice and reevaluating their best practices and procedures to avoid this problem,” he said.

Status check interrupted

While some technology companies like Facebook or Twitter might have a brief outage, “it doesn’t cause a disruption of this magnitude. People can’t check status updates or they can’t check their Twitter feed,” Leader said. “If it happens to an airline…. that can be incredibly significant.”

While Delta’s outage was one of the most disruptive glitches in recent memory, and Southwest Airlines also had hundreds of cancellations after a router failure in July, it’s not a new issue. Airlines deal with small and large outages regularly, whether it’s a brief problem in one part of an airport terminal or a glitch in a particular program, or a broader outage.

Last week was British Airways’ turn, when problems with a new check-in system snarled operations for a day or so.

Southwest has “outage kits” at every airport where it operates, to give employees at ticket counters and gates instructions on how to process passengers during an outage, said Biljana Obrenic, customer service manager of regulatory compliance.

“With any outage, we get so used to the ability to use our systems [that] when they go down, you kind of hyperventilate and you panic for a moment,” said David Woodard, Southwest’s director of ground operations standards. “Then you have to take a quick breath and say, okay, we have a procedure.”

Labor-intensive work

Southwest agents can write out boarding passes and manually fill out baggage tags, and gate agents can use manual boarding lists to check names against IDs. That labor-intensive work slows the process down.

“It takes more time, because you’re doing more writing and verifying and double- and triple-checking,” Obrenic said.

One advantage: A majority of Southwest’s customers check in 24 hours in advance, since under Southwest’s boarding procedures that gives passengers a better boarding position. That increases the likelihood that Southwest customers may arrive at the airport and already have a boarding pass printed out or saved on their mobile phone, reducing the disruption caused by a brief outage at the check-in counter, according to Southwest.

“It’s really worked in our favor when we do have an outage,” Woodard said.

Leader said in the wake of the Delta and Southwest outages, he believes airlines are “taking a harder look at what can be done in the event of systems going down.”

“These are some of the most complicated infrastructures in the world, combining layers of technology, physical aircraft, people that all need to work seamlessly together 100 percent of the time,” he said.