Resultados 1 a 6 de 6
  1. #1
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    17,966

    [EN] Power surge at British Airways data center causes flight chaos

    On Saturday morning, a brief power surge knocked out British Airways’ communications systems grounding the carrier’s entire London operations -- nearly 600 flights were canceled.

    While about 95 percent of flights are running, thousands of customers are still being re-routed. More than two-thirds of the 75,000 affected passengers were scheduled to reach their final destination by the end of Monday.

    Analysts estimated the number of people due compensation at closer to 170,000.

    Damages for rebooking and compensating customers is estimated at about 100 million euros.

    Peter Sayer
    May 29, 2017


    A major British Airways crash has highlighted the importance for businesses of testing backup systems and disaster recovery procedures to ensure that they work as planned.

    The airline experienced what CEO Alex Cruz described as "a major IT systems failure" that, he said, affected all check-in and operational systems.

    The failure on Saturday, May 27, resulted in the delay or cancellation of hundreds of flights, leaving thousands of passengers stranded at London's Heathrow Airport on a holiday weekend. Things were still not back to normal two days later.

    Cruz described the cause of the failure as "a power supply issue," without going into detail.

    A spokeswoman for British Airways elaborated.

    "It was a power supply issue at one of our U.K. data centres. An exceptional power surge caused physical damage to our infrastructure and as a result many of our hugely complex operational IT systems failed," she said.

    Before you ask: "We do have a back-up system," the spokeswoman said.

    But?

    "But on this occasion it failed."

    British Airways isn't the first airline to be laid low by a power failure. Delta Air Lines suffered similarly in August 2016 when a switch box carrying power into the company's headquarters failed, grounding flights worldwide.

    A single point of failure had also brought down systems at Southwest Airlines the previous month, although on that occasion the problem was in a network router.

    Although British Airways had more than one data center, it's not inconceivable that the same power surge could have damaged two sited close together.

    Back in 2012 the company revealed that it had two data centers on sites right next to its Waterside global headquarters near Heathrow. Those sites housed 500 cabinets in six halls according to Sunbird, the company that supplied the airline's DCIM (data center infrastructure management) system.

    So far, British Airways doesn't know why its backup plans failed. IT staff have spent the last two days getting systems up and running again, and aren't done yet.

    "When the customer disruption is completely over, we will undertake an exhaustive investigation to find out the exact circumstances and most importantly ensure that this can never happen again," said the spokeswoman.

    That was probably Delta's intention too -- until its IT systems went down again in January 2017, resulting in the cancellation of around 150 flights. That time around, the U.S. Federal Aviation Authority said "automation issues" had caused the flight cancellations.

    Between ticket refunds and compensation payments British Airways, like Delta before it, will be hundreds of millions out of pocket as a result of the failure of its backup systems.

    http://www.pcworld.com/article/31988...ght-chaos.html
    Última edição por 5ms; 30-05-2017 às 15:49.

  2. #2
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    17,966
    Lembra aquela estorinha de se o custo da proteção é maior do que a despesa com eventuais indenizações, deixa lamber. Por esse raciocinio, com 100+ milhões de euros em risco, não é crivel que a infraestrutura seja um puxadinho.

  3. #3
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    17,966

    Passengers still without bags

    The airline said it hoped most of a backlog of tens of thousands of bags would be delivered by the end of Wednesday.

    Gwyn Topham, Julia Kollewe and Alex Hern
    30 May 2017 20.34 BST

    British Airways is still struggling to return baggage to passengers, salvage its battered reputation and investigating the cause of a computer system outage that left 75,000 people stranded at the weekend.

    Cancelled holidays and chaotic scenes at Heathrow and Gatwick airports over the bank holiday weekend have been followed by reports of passengers’ baggage being forwarded to destinations abroad while its owners had been unable to travel.

    The airline said it hoped most of a backlog of tens of thousands of bags would be delivered by the end of Wednesday. It also pledged to compensate all passengers, with the bill expected to reach more than £100m.

    Around £500m was wiped off the market value of BA’s owner, IAG, in early trading in London on Tuesday, although shares later recovered to 1.4% – cutting the value of the airline group by £170m.

    BA said it was still investigating a power issue that it says caused the check-in and operating systems to crash, and left the airline struggling to locate and contact even its own staff.

    https://www.theguardian.com/business...ys-it-meltdown

  4. #4
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    17,966

    British Airways says IT chaos was caused by human error

    5 June 2017

    Willie Walsh, chief executive of IAG, said an engineer disconnected a power supply, with the major damage caused by a surge when it was reconnected.

    He said there would now be an independent investigation "to learn from the experience".

    However, some experts say that blaming a power surge is too simplistic.

    Mr Walsh, appearing at an annual airline industry conference in Mexico on Monday, said: "It's very clear to me that you can make a mistake in disconnecting the power.

    "It's difficult for me to understand how to make a mistake in reconnecting the power," he said.

    He told reporters that the engineer was authorised to be in the data centre, but was not authorised "to do what he did".

    IAG has commissioned an "independent company to conduct a full investigation" into the IT crash and is "happy to disclose details" of its findings, Mr Walsh said. The name of the company involved had not been disclosed.

    The BBC reported last week that senior company executives at IAG were pushing for an external probe into the computer meltdown.

    BA had said that a power surge caused the computer problem, but gave little further explanation.

    Scepticism

    However, an email leaked to the media last week suggested that a contractor doing maintenance work inadvertently switched off the power supply.

    The email said: "This resulted in the total immediate loss of power to the facility, bypassing the backup generators and batteries... After a few minutes of this shutdown, it was turned back on in an unplanned and uncontrolled fashion, which created physical damage to the systems and significantly exacerbated the problem."

    But the BBC's transport correspondent, Richard Westcott, has spoken to IT experts who are sceptical that a power surge could wreak such havoc on the data centres.

    BA has two data centres about a kilometre apart. There are question marks over whether a power surge could hit both. Also, there should be fail-safes in place, our correspondent said.

    BA has already launched its own internal investigation, led by its chief executive Alex Cruz.

    Following the computer crash, which caused travel chaos for people travelling from Heathrow and Gatwick, Mr Walsh gave his full backing to Mr Cruz.

    BA and IAG also rejected claims that the incident was due to Mr Cruz's decision to outsource the airline's IT department to India as part of cost-cutting measures.

    Also on Monday, Mr Walsh apologised again for the incident, saying: "When you see customers who suffered, you wouldn't want it to happen to any airline or any business."

    He added: "I wouldn't suggest for one minute we got communications right at BA, we didn't."

    http://www.bbc.com/news/business-40159202
    Última edição por 5ms; 06-06-2017 às 19:36.

  5. #5
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    17,966

    Damage to networking hardware made backup systems useless, says CEO

    Tanwen Dawn-Hiscox, Peter Judge
    31 May 2017

    ...

    BA chief Alex Cruz is leading the inquiry into the outage with the assistance of several power supply specialists. The airline has two data centers, Boadicea House and Comet House, with three halls in each, located close to the Heathrow airport. The failure evidently resulted in both data centers going down, but details are still scarce on how this could have happened.

    ...

    The official line is that the failure happened in two stages. According to BA, the problems started when one of the UPS systems at Heathrow’s Boadicea House data center, which was powered by a combination of mains, battery and diesel, “was shut down”.

    ...

    The biggest question is why the IT services in the primary data center didn’t immediately failover to their secondary data center.

    Cruz’s answer so far is that power was brought back to Boadicea House. The IT equipment was powered up in an “uncontrolled fashion,” causing a surge and “catastrophic physical damage” to the communications hardware disrupting all 200 systems across the airline’s network. This makes little sense, as Boadicea should have been out of use and the services running from Comet House by then.

    ...

    The inquiry now seeks to determine ... why the IT services were not switched over to the secondary facility.

    http://www.datacenterdynamics.com/co.../98395.article

  6. #6
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    17,966
    Eu devo ter perdido o capitulo da novela onde foi revelado que os serviços eram replicados e que existia um data center primário e outro secundário (stand-by?). Também perdi o capitulo que explica qual data center foi utilizado para restabelecer as operações.

Permissões de Postagem

  • Você não pode iniciar novos tópicos
  • Você não pode enviar respostas
  • Você não pode enviar anexos
  • Você não pode editar suas mensagens
  •