Resultados 1 a 8 de 8
  1. #1
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    14,981

    [EN] The myth of zero downtime

    The typical managed hosting customer experiences roughly one outage a year. This figure comes from thirteen years of asking Gartner clients, day in and day out, about their operational track record.

    The typical colocation facility in the US is built to Tier III standards, with a mathematical expected availability of about 99.98%. In Europe, colocation facilities are often built to Tier II standards intead, for an expected availability of about 99.75%. Many colocation facilities do indeed manage to go for many years without an outage. So do many enterprise data centers — including Tier I facilities that have no redundancy whatsoever. The mathematics of the situation don’t say that you will have an outage — these are merely probabilities over the long term. Moreover, there will be an additional percentage of error that is caused by humans.


    by Lydia Leong

    Every time there’s been a major Amazon outage, someone always says something like, “Regular Web hosters and colocation companies don’t have outages!” I saw an article in my Twitter stream today, and finally decided that the topic deserves a blog post. (The article seemed rather linkbait-ish, so I’m not going to link it.)

    It is an absolute myth that you will not have downtime in colocation or Web hosting. It is also a complete myth that you won’t have downtime in cloud IaaS run by traditional Web hosting or data center outsourcing providers.

    The typical managed hosting customer experiences roughly one outage a year. This figure comes from thirteen years of asking Gartner clients, day in and day out, about their operational track record. These outages are typically related to hardware failure, although sometimes they are related to service provider network outages (often caused by device misconfiguration, which can obliterate any equipment or circuit redundancy). Some customers are lucky enough to never experience any outages over the course of a given contract (usually two to three years for complex managed hosting), but this is actually fairly rare, because most customers aren’t architected to be resilient to all but the most trivial of infrastructure failures. (Woe betide the customer who has a serious hardware failure on a database server.) The “one outage a year” figure does not include any outages that the customer might have caused himself through application failure.

    The typical colocation facility in the US is built to Tier III standards, with a mathematical expected availability of about 99.98%. In Europe, colocation facilities are often built to Tier II standards intead, for an expected availability of about 99.75%. Many colocation facilities do indeed manage to go for many years without an outage. So do many enterprise data centers — including Tier I facilities that have no redundancy whatsoever. The mathematics of the situation don’t say that you will have an outage — these are merely probabilities over the long term. Moreover, there will be an additional percentage of error that is caused by humans. Single-data-center kings who proudly proclaim that their one data center has never had an outage have gotten lucky.

    The amount of publicity that a data center outage gets is directly related to its tenant constituency. The outage at the 365 Main colocation facility in San Francisco a few years back was widely publicized, for instance, because that facility happened to house a lot of Internet properties, including ones directly associated with online publications. There have been significant outages at many other colocation faciliities over the years, though, that were never noted in the press — I’ve found out about them because they were mentioned by end-user clients, or because the vendor disclosed them.

    Amazon outages — and indeed, more broadly, outages at large-scale providers like Google — get plenty of press because of their mass effects, and the fact that they tend to impact large Internet properties, making the press aware that there’s a problem.

    Small cloud providers often have brief outages — and long maintenance windows, and sometimes lengthy maintenance downtimes. You’re rolling the dice wherever you go. Don’t assume that just because you haven’t read about an outage in the press, it hasn’t occurred. Whether you decide on managed hosting, dedicated hosting, colocation, or cloud IaaS, you want to know a provider’s track record — their actual availability over a multi-year period, not excluding maintenance windows. Especially for global businesses with 24×7 uptime requirements, it’s not okay to be down at 5 am Eastern, which is prime-time in both Europe and Asia.

    Sure, there are plenty of reasons to worry about availability in the cloud, especially the possibility of lengthy outages made worse by the fundamental complexity that underlies many of these infrastructures. But you shouldn’t buy into the myth that your local Web hoster or colocation provider necessarily has better odds of availability, especially if you have a non-redundant architecture.

  2. #2
    Quero ser Guru
    Data de Ingresso
    Feb 2011
    Posts
    83
    Interessante!!

  3. #3
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    14,981
    O Google publicou um paper que investiga a causa principal de downtime nos servidores de storage da empresa. Se você acha que é HD abrindo o bico, memória com Alzheimer, ventoinhas e outros problemas de hardware, surpreenda-se ao saber que a causa número 1 são upgrades do sistema operacional que exigem reboot e, pior, muitas vezes dão zebra e o servidor não reinicia. Daí a genialidade da Microsoft de ter criado dois dominios para redundância de VMs: falha e upgrade. Você não apenas quer as suas VMs alocadas em infraestruturas independentes como os upgrades de software realizados em lotes em ocasiões diferentes.
    Última edição por 5ms; 09-06-2013 às 22:08.

  4. #4
    Quero ser Guru
    Data de Ingresso
    Feb 2011
    Posts
    83
    no brasil, a coisa seria outra?

  5. #5
    Moderador
    Data de Ingresso
    Oct 2010
    Localização
    Rio de Janeiro
    Posts
    2,678
    Citação Postado originalmente por 5ms Ver Post
    O Google publicou um paper que investiga a causa principal de downtime nos servidores de storage da empresa. Se você acha que é HD abrindo o bico, memória com Alzheimer, ventoinhas e outros problemas de hardware, surpreenda-se ao saber que a causa número 1 são upgrades do sistema operacional que exigem reboot e, pior, muitas vezes dão zebra e o servidor não reinicia. Daí a genialidade da Microsoft de ter criado dois dominios para redundância de VMs: falha e upgrade. Você não apenas quer as suas VMs alocadas em infraestruturas independentes como os upgrades de software realizados em lotes em ocasiões diferentes.
    Aprendeu com as próprias falhas... ;-)

  6. #6
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    14,981
    Citação Postado originalmente por FelipeGabriel Ver Post
    no brasil, a coisa seria outra?
    Acho que no Brasil é a mesma coisa.

    Na minha opinião, depende muito do setor da economia e do tipo de empresa/instituição. Se a organização tem posição significativa no mercado e o "core business" requer aplicações de missão critica, é comum encontrar infraestrutura, equipamentos e pessoal de alto nivel. Porém, todas as combinações são encontradas no mercado e muitas vezes na mesma empresa. Tem a empresa com pessoal mediocre e equipamento excelente, tem instituição com equipamento obsoleto e pessoal brilhante, tem organização com equipamentos ruins e pessoal despreparado. Mas também é comum uma divisão ter tudo do melhor e outra cacarecos.
    Última edição por 5ms; 10-06-2013 às 15:15.

  7. #7
    WHT-BR Top Member
    Data de Ingresso
    Dec 2010
    Posts
    14,981
    This map shows the downtime and incidents of all 350,000+ Pingdom customers during the last hour on a rolling basis.

    https://www.pingdom.com/livemap/

  8. #8
    Membro
    Data de Ingresso
    Jun 2011
    Localização
    Dallas, TX
    Posts
    20
    Citação Postado originalmente por 5ms Ver Post
    O Google publicou um paper que investiga a causa principal de downtime nos servidores de storage da empresa. Se você acha que é HD abrindo o bico, memória com Alzheimer, ventoinhas e outros problemas de hardware, surpreenda-se ao saber que a causa número 1 são upgrades do sistema operacional que exigem reboot e, pior, muitas vezes dão zebra e o servidor não reinicia. Daí a genialidade da Microsoft de ter criado dois dominios para redundância de VMs: falha e upgrade. Você não apenas quer as suas VMs alocadas em infraestruturas independentes como os upgrades de software realizados em lotes em ocasiões diferentes.
    Eu nunca teria imaginado que essa seria a causa numero 1.

    Mas, existe software para servidores dedicados que resolvem esse problema. Aqui na Limestone usamos o "Ksplice - Uptrack" para nossos clientes. Este programa faz updates e upgrades de sistemas operacionais sem precisar de reboot.
    Felipe Fleury - Limestone Networks - International Account Specialist
    Cloud, Dedicated & Enterprise Hosting - Premium Network - Passionate Support
    Resell Dedicated Servers @LimestoneInc - limestonenetworks.com - 877.586.0555

Permissões de Postagem

  • Você não pode iniciar novos tópicos
  • Você não pode enviar respostas
  • Você não pode enviar anexos
  • Você não pode editar suas mensagens
  •