back to article Amazon cloud sinks, smothers Web 2.0 darlings

Whole swathes of Web 2.0 disappeared to the dark side of the cloud today, as an outage at one of Amazon's EC2 data centres torpedoed the likes of Quora and Reddit. The service's status site showed that things started going haywire just over five hours ago, when techies began "investigating latency and error rates with EBS …

COMMENTS

This topic is closed for new posts.
  1. Combat Wombat
    FAIL

    It's stuff like this..

    that should be held up when ever your boss mentions any hint of going to the "cloud"

    If this were your business.. you'd be hosed, and you'd have zero control over getting the solution resolved.

    Not only that, but because your not a super platinum, high paying customer.. you are condemned to sit in a queue until they get to you.

    Fornicate everything about the "cloud" with an iron stick.

    If you don't control the iron, you don't control the risk.

    1. Don Mitchell

      Data Center Failures

      And you think if you roll your own data center, it will be more reliable and cost effective?

      1. Combat Wombat
        Boffin

        Well....

        It might be more costly, but once you factor in the total cost of an outage... there is no comparison.

        So far... reddit has been down for about 8 hours.

        So, lets do some rough maths... based on a $500 man company, who moves their services to the magical cloud.

        So... your cloud solution shits itself and dies... so you now have about 500 people sitting around twiddling their thumbs.

        Say the average pay across the company is $30/hr

        so 500 x 30$/hr x 8 hrs = $120,000

        That 8 hr outage has just cost you $120, 000 in productivity, not to mention the loss in reputation to your customers etc.

        Now, I will BET that the guys who write the contracts for cloud hosting have various "all care, no responsibly" clauses written into the contract, which means you can't recoup any of that loss.

        Say this sort of outage happens, 5 times in a year..

        That's about $6 million pissed away because you don't control your servers or devices.

        Now add all the risk factors surrounding security of your data.

        Not to mention there is no way of independently verifying that your cloud providers are doing what they say they are.. or that their data center is not staffed with inbred, lead poisoned, brain damaged monkeys.

        I stand by my earlier statement. Companies who trust the cloud, get everything they deserve.

      2. Charlie Clark Silver badge
        FAIL

        Reliability and cost-effective are different things

        Running your own data centre? You'd be crazy if you didn't have a fallback one. Yes, this is expensive but failure can be even more expensive. If you do own the iron you have plans and insurance for them.

        Cost-effective: well, who really gives a shit about reddit or foursquare or quora? It really depends on your definition of effective. Pity the other guys who have real business models built on this stuff.

  2. Tom 15
    Terminator

    Hmm

    On the same day that Skynet became self-aware? Coincidence? I think not...

  3. ran93r
    FAIL

    Bloody cloud nonsense

    Reddit has been down since I got into the office this morning, I actually got some work done ffs!

  4. Andre 3
    FAIL

    DR

    Wonder if any of these so-called 'big names' have heard of DR....?

    1. Anonymous Coward
      Pint

      You'll never need DR ever

      because it's all in the cloud. Trust us, there's no risk at all.

    2. Anonymous Coward
      Boffin

      @Andre

      DR in the cloud?

      Errm ok... Its not that simple.

      First you're not the owner of the hardware, or data center. So how do you impose DR?

      For what you're paying... what DR?

      You want DR, build your own data center, staff it and your own hardware configurations.

      You can do it, but its not cheap or easy.

      Outside of these startups, think about trying to handle DR for things measured in PB.

      Trust me, its not easy.

  5. Anonymous Coward
    FAIL

    Great idea guys

    When it works but centralization of anything on any hardware is guaranteed to fail at some point and for me the Cloud is one big failure waiting to happen.

    As I always say keep it simple stupid.

  6. Anonymous Coward
    Thumb Up

    In other news

    Office productivity is up 37%

  7. Campbeltonian

    Availability zones

    What's curious about this is that it affected all four availability zones in US-EAST-1.

    When one of my company's instances starting having problems this morning, I attempted to start a replacement instance in a different availability zone and restore from an EBS backup. This failed.

    The whole point of having availability zones is meant to be that if one zone goes down, the others remain unaffected.

    The long-standing issue with EC2 is that there's no easy way to copy images, instances and EBS volumes from one *region* to another. Ideally my company would have images and backups available in a number of different regions, so that if Virginia disappears off the map, we can just start a new instance in Dublin as if nothing had happened. As it stands, we have everything in the affected region because having anything anywhere else is impractical.

    Fortunately, only one of our instances in US-EAST-1 was affected - most have carried on working fine.

  8. Matt J
    Stop

    To be fair to Amazon...

    ...only one of their availability zones had issues, and in much the same way that a private DC can have problems; if the Web 2 outfits had designed for multi-site delivery (using Amazon or another provider as a secondary site), they would be fine.

    1. Andy Barker
      FAIL

      Whole region is having problems

      As Campbeltonian says, a whole Region pretty much went titsup! People's multi-AZ setups are having problems in that Region.

      Makes you wonder how the heck this could have happened.... then again on occasions we have whole chunks of the UK dropping off the internet supposedly due to a single router / exchange failure.

  9. Chris Miller

    Infoworld: "IT's cloud resistance is starting to annoy businesses"

    Spooky coincidence or what? Infoworld today has a blog arguing that resistance to cloud computing by IT 'luddites' may be career limiting. The sad thing is, he's probably right. And if you understand why that is, you'll understand a lot about what is wrong with business management in general and IT management in particular.

    As one of the comments says: "This Kool-Aid sure tastes funny".

    http://www.infoworld.com/d/cloud-computing/its-cloud-resistance-starting-annoy-businesses-383

    1. Anonymous Coward
      Anonymous Coward

      title

      I love how that article moans at those with specialist knowledge in provision of IT services who are (rightly) skeptical of leaping onto the latest rebranding of mainframe computing without fully researching how it would affect the business and taking their time to determine if it's even worth the risk of transitioning the current system over to a new one.

      I wonder if they would happily take some experimental medication against the advice of their doctor too?

  10. LaeMing
    Unhappy

    All sing:

    The Web2 men, say "up today"

    But all my data's gone away

    And it's raining (dum dum dum-dum-dum)

    Raining in my cloud.

    1. Chris Miller

      There's an old saying:

      Don't piss down my back and tell me it's raining.

      The Outlaw Josey Wales (1976)

  11. Anonymous Coward
    FAIL

    AWS Updates

    8:54 AM PDT We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.

    10:26 AM PDT We have made significant progress in stabilizing the affected EBS control plane service. EC2 API calls that do not involve EBS resources in the affected Availability Zone are now seeing significantly reduced failures and latency and are continuing to recover. We have also brought additional capacity online in the affected Availability Zone and stuck EBS volumes (those that were being remirrored) are beginning to recover. We cannot yet estimate when these volumes will be completely recovered, but we will provide an estimate as soon as we have sufficient data to estimate the recovery. We have all available resources working to restore full service functionality as soon as possible. We will continue to provide updates when we have them.

    11:09 AM PDT A number of people have asked us for an ETA on when well be fully recovered. We deeply understand why this is important and promise to share this information as soon as we have an estimate that we believe is close to accurate. Our high-level ballpark right now is that the ETA is a few hours. We can assure you that all-hands are on deck to recover as quickly as possible. Well update the community as we have more information.

  12. RollinPowell
    Joke

    live by the cloud, die by the cloud

    The cloud is great until it turns to blue sky.

    That's why I use the rock (er... is there a better antonym of cloud?)

  13. E 2

    @AC 21st April 2011 19:17 GMT

    You can script AWS (esp. EC2) such that failed instances will trigger new instances being fired up, those new instances can be invoked at other AWS locations. So single point of failure charge made at AWS is not entirely fair.

    And, well, you have all your infrastructure in a server room at your head office - that room or it's network connections can fail too. That must count as a single point of failure!

    1. Combat Wombat
      Flame

      *sigh*

      Which is why you have separate, redundant circuits for your network you muppet.

  14. Muckminded

    Etc

    Sounds like they need a little more redundancy. Sounds like they need a little more redundancy.

  15. GCom

    Fragility of block storage & hypervisors for clouds

    Is it just me or does the idea of distributed block level storage sound a rather poor concept? The issue with distributing the blocks is as the lowest common denominator it is also the most sensitive to latency.

    Fundamentally the reason for AWS requirement for this approach is the use of hypervisor providing hardware emulation which requires direct block access for the VM images. One has to ask if this is really such a good approach long term for clouds given its considerable performance overhead as well as fragility..... PaaS anyone?

  16. Anonymous Coward
    Troll

    re: ElasticHosts advertising spam

    Oh dear.. business must be bad if you need to resort to spamming forums featuring your so called ''direct competitor'' ... more ElasticHosts spam...

  17. Eddie Johnson
    FAIL

    Remember What a Cloud Is

    A "cloud" is not a physical entity, its a graphical representation on a schematic that essentially means "not our responsibility." Apparently it's not Amazon's responsibility either. If your company trusts their core business to this model they deserve to be offline, permanently.

    This is the equivalent of hosting your company web site on Geocities unless your hosting agreement provides guarantees for not just hosting costs but also lost revenue.

  18. mmouse19

    AWS in the bunker

    http://www.youtube.com/watch?v=m3wrBFuGK2A

This topic is closed for new posts.

Other stories you might like