Resultados 1 a 9 de 9
  1. #1
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    18,556

    [EN] Google's wide-scope 2-hour cloud outage was caused by updates

    Liam Tung
    February 10, 2017

    A botched software update triggered last month's two-hour outage affecting Google Compute Engine (GCE) instances, cloud VPNs, and network load balancers.

    While the incident wasn't as serious as a past network outage, Google had promised a full explanation due to the "wide scope" of this one, which dropped connections to all GCE instances, cloud VPN tunnels and network load balancers that were created or live-migrated on Monday, January 30.

    "We apologize for the wide scope of this issue and are taking steps to address the scope and duration of this incident as well as the root cause itself," said Google's Cloud Platform engineers.

    This outage was triggered by a "large set" of updates to its load-balancing gear, although the outage itself was caused by updates getting jammed during testing inside a canary deployment.

    "All inbound networking for GCE instances, load balancers and VPN tunnels enter via shared layer 2 load balancers. These load balancers are configured with changes to IP addresses for these resources, then automatically tested in a canary deployment, before changes are globally propagated," explained Google.

    "The issue was triggered by a large set of updates, which were applied to a rarely used load-balancing configuration. The application of updates to this configuration exposed an inefficient code path, which resulted in the canary timing out. From this point all changes of public addressing were queued behind these changes that could not proceed past the testing phase," it added.

    Google's short-term response was to increase the canary timeout phase so that if the same series of errors occurs, it will only slow network changes rather than completely stop them. Over the longer term, it plans to improve the inefficient code path.

    Google has also begun work to "replace global propagation of address configuration with decentralized routing".

    "This work is being accelerated as it will prevent issues with this layer having global impact," it said.

    http://www.zdnet.com/article/google-...ur-cloud-down/

  2. #2
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    18,556

    Google Compute Engine Incident #17003

    Incident began at 2017-01-30 10:54 and ended at 2017-01-30 12:50 (all times are US/Pacific).

    ISSUE SUMMARY

    On Monday 30 January 2017, newly created Google Compute Engine instances, Cloud VPNs and network load balancers were unavailable for a duration of 2 hours 8 minutes. We understand how important the flexibility to launch new resources and scale up GCE is for our users and apologize for this incident. In particular, we apologize for the wide scope of this issue and are taking steps to address the scope and duration of this incident as well as the root cause itself.

    DETAILED DESCRIPTION OF IMPACT

    Any GCE instances, Cloud VPN tunnels or GCE network load balancers created or live migrated on Monday 30 January 2017 between 10:36 and 12:42 PDT were unavailable via their public IP addresses until the end of that period. This also prevented outbound traffic from affected instances and load balancing health checks from succeeding. Previously created VPN tunnels, load balancers and instances that did not experience a live migration were unaffected.

    ROOT CAUSE

    All inbound networking for GCE instances, load balancers and VPN tunnels enter via shared layer 2 load balancers. These load balancers are configured with changes to IP addresses for these resources, then automatically tested in a canary deployment, before changes are globally propagated.

    The issue was triggered by a large set of updates which were applied to a rarely used load balancing configuration. The application of updates to this configuration exposed an inefficient code path which resulted in the canary timing out. From this point all changes of public addressing were queued behind these changes that could not proceed past the testing phase.

    REMEDIATION AND PREVENTION

    To resolve the issue, Google engineers restarted the jobs responsible for programming changes to the network load balancers. After restarting, the problematic changes were processed in a batch, which no longer reached the inefficient code path. From this point updates could be processed and normal traffic resumed. This fix was applied zone by zone between 11:36 and 12:42.

    To prevent this issue from reoccurring in the short term, Google engineers are increasing the canary timeout so that updates exercising the inefficient code path merely slow network changes rather than completely stop them. As a long term resolution, the inefficient code path is being improved, and new tests are being written to test behaviour on a wider range of configurations.

    Google engineers had already begun work to replace global propagation of address configuration with decentralized routing. This work is being accelerated as it will prevent issues with this layer having global impact.

    Google engineers are also creating additional metrics and alerting that will allow the nature of this issue to be identified sooner, which will lead to faster resolution.

    https://status.cloud.google.com/incident/compute/17003

  3. #3

  4. #4
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    18,556
    Não, não é a lista das localidades afetadas pela enésima cagada do gigante dos fracassos. Esta é toda a infraestrutura, e como pode-se observar, com serviços sequer disponíveis nas 6 miseras localidades, o que não impede a imprensa alugada de listar o fracasso na frente da Microsoft, IBM, e Oracle, que possuem não apenas infraestrutura incomparavelmente maior como, mais importante, clientes.

  5. #5
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    18,556

    Pokémon Go - Melhores momentos

    Sebastian Moss
    12 July 2016

    Google Cloud-powered Pokémon Go struggles under heavy demand

    Participants in the social phenomenon that is Pokémon Go have suffered from downtime and dropped connections after the game saw unprecedented demand. Developer Niantic Labs has delayed the title’s global roll-out to ensure that existing players can access the free-to-play augmented reality game.

    Datacenter Dynamics believes the game is hosted on Google Cloud Platform, which has been looking to nab high profile clients after gaining Snapchat as a customer in 2014.

    Niantic did not reply to requests for comment, but job listings uncovered by DCD look for employees to “create the server infrastructure to support our hosted AR/Geo platform underpinning projects such as Pokémon GO using Java and Google Cloud.”

    This is perhaps not all that surprising, considering Niantic was originally an internal startup at Google, founded by Google Earth’s John Hanke and spun out in 2015. Google still owns a stake in the venture, having invested a share of $30 million in Niantic alongside Nintendo and The Pokémon Company in October 2015.

    With Google Cloud Platform seemingly struggling to cope with the heavy data demands of the video game service, some have poked fun at the downtime.

    Amazon Web Services CTO Werner Vogels tweeted: “Dear cool folks at @NianticLabs please let us know if there is anything we can do to help! (I wanted that drowzee)”.

    ...

    http://www.datacenterdynamics.com/co...68.fullarticle


    Alex Hern
    30 September 2016

    If you played Pokémon Go anywhere near its launch date, you probably noticed that it broke. A lot.

    There was always the suspicion that its instability was because the servers were falling over under the weight of the traffic, but today, there’s confirmation of that, from the unlikely source of Google.

    ...

    Google says it managed to “seamlessly” add extra capacity to enable Niantic “to stay well ahead of their record-setting growth”, which is a vaguely rose-tinted recollection of the actual launch, although the company’s director of customer reliability engineering, Luke Stone, does concede that “Not everything was smooth sailing at launch!”

    https://www.theguardian.com/technolo...traffic-google



    James Wright
    19th January 2017

    Pokemon GO Not Working: Server DOWN as Niantic report game offline GLOBALLY

    POKEMON GO is down users around the world with server issues reportedly causing the popular app to stop working as creators Niantic provide an update.

    Pokemon GO is not working around the world with numerous users reporting the game is down.

    Pokemon GO servers appear to be offline and currently, it's unclear what is causing the issues with the game, having gone months without a major outage.

    Players, including ourselves, are finding an error message which reads "unable to Authenticate" popping up whenever attempting to sign into the game.

    “It's come up with the 'Sign up with'... screen and when I click 'Google' it says failed to authenticate,” one user wrote on Twitter.

    As avid Pokemon GO players, we can also testify to being unable to log in whilst journeying home.

    In response, developers Niantic have confirmed the current problems on Twitter, explaining that the Pokemon Go servers are down and telling fans: "We are currently experiencing server issues. We are working on a fix & will provide an update when resolved. Thank you for your patience."

    So far Niantic has failed to offer any explanation as to what might be causing today's server outage, which according to server tracking websites show that whatever the issue is, it appears to be affecting Pokemon Go players worldwide.

    Back when the game launched last year, Pokemon GO servers went down on numerous occasions.

    However, typically this was due to the game being rolled out to millions of users worldwide.

    The latest Pokemon GO update was launched in the last day and was made available in the UK as of the early hours of this morning.

    However, it's unclear if this latest update has had any impact on the Pokemon GO server problems currently experienced by fans.

    http://www.dailystar.co.uk/tech/gami...t-Game-Offline

  6. #6
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    18,556

    NHS - Melhores momentos

    Kat Hall
    1 Feb 2017

    Google is blocking access to the entire NHS network

    Google mistakes high NHS web traffic for cyber attack.

    Google is blocking access to the entire NHS network, mistaking the amount of traffic it is currently receiving as a cyber attack.

    An email from an NHS trust's IT department seen by The Register confirmed that the US search giant has mistaken the current traffic levels for a botnet.

    The email headed "Google Access" stated: "Google is intermittently blocking access due to the amount of traffic from NHS Trusts Nationally (This is not being blocked by the IT Department).

    "This is causing Google to think it is suffering from a cyber-attack.

    "We are advising staff to use an alternative search engine i.e. Bing to bypass this problem.

    The source said they did not know why Google had suddenly decided to block access to the NHS net, but confirmed it was the "go-to resource" for a lot of clinicians.

    The NHS is one of the biggest employers in the world, with 1.2 million people working for the organisation.

    Google refused to comment but said: "It is not correct to say we have blocked the entire NHS network." ®

    Updated at 12:31 UTC on Wednesday 1 February to add: An NHS Digital spokesman contacted The Reg to say: “We are aware of the current issue concerning NHS IP addresses ... This would appear to be due to the high number of people using our systems and trying to access Google at peak times. We are currently in discussion with Google as to how we can help them to resolve the issue.”

    https://www.theregister.co.uk/2017/0..._for_a_botnet/

  7. #7
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    18,556
    Os 1,2 milhões contratados da NHS não começaram a acessar os sites do Google no dia 1o. de feveiro de 2017, não é mesmo?

    Estaria o gigante sob ataque?

    Dias atrás, postei sobre a interrupção do envio de relatórios DMARC pelo Google, indiretamente confirmado pelos números do site dmarcian.com.

    Ao retomar o envio, observei queda significativa de tentativas de fraudes. Caiu abruptamente de 3 mil/dia para 1500 e no relatório de quinta-feira, 920 incidentes. Ainda que 50x o número de qualquer outro provedor parecia que algo tinha sido feito. Porém, no relatório de ontem, 9400 (nove mil e quatrocentos) incidentes! Mas não ficou só nisso. Recebi spam com VIRUS enviado pelo Gmail. Virus vagaba, manjado. Não lembro de ter visto coisa igual. Gmail abrir a porteira e deixar passar spam adoidado sim, deixar passar virus, não. Dá o que pensar se a infraestrutura do gigante tem o porte garganteado.

  8. #8
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    18,556
    DE volta ao "normal"


  9. #9
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    18,556


    Gigante das trapaças.

    Alguém parece estar fabricando tráfego.

Permissões de Postagem

  • Você não pode iniciar novos tópicos
  • Você não pode enviar respostas
  • Você não pode enviar anexos
  • Você não pode editar suas mensagens
  •