back to article Whoops, there goes my cloud: What to do when AWS foresakes you

It's been an interesting period for cloud services, with both Amazon and Skype suffering major outages. In Amazon's case it was a good old-fashioned network interruption that caused significant knock-on effects (it released a very frank explanation that is well worth a read). As for Skype it was a configuration change that broke …

  1. Anonymous Custard

    Simple fact of life

    A good provider is one who learns from their mistakes and problems.

    A better provider of course is one who also learns from those of their peers and competitors, and also from the past.

    1. Your alien overlord - fear me

      Re: Simple fact of life

      learns from their peers *and* acts to make sure it doesn't happen to them.

      1. Anonymous Custard

        Re: Simple fact of life

        Personally I take one to include the other.

        But then I'm an engineer and not a manager, so probably a fair point to make...

  2. Your alien overlord - fear me

    You say boutique suppliers may not have geographic redundancy but it seems neither does AWS or Skype it seems.

    This is why I can't get it into my head why people bother with cloudy stuff. At the end of the day, it's just your data sitting on someone else's server, somewhere out of your eyesight. It'll be run by the same level of competence as your own in house engineers (take that either way) but without you knowing exactly how the setup is and what security/resilience/redundancy is in place.

    1. AMBxx Silver badge


      Whenever anything does go badly wrong, it always seems to be something trivial that just snowballs - DNS typically.

    2. Anonymous Coward
      Anonymous Coward

      Re: what security/resilience/redundancy is in place.

      What you said, plus...

      Do these offsite clouds offer any realistic way for customers to run a test to see how the system behaves when things start to go titsup? Thought not.

      Obviously there are many cases where the behaviour in the event of failure doesn't matter much, but in the general case, some people might want to know whether the system is going to fail gracefully or fail catastrophically. Some people are even still willing to pay for systems that are unlikely to fail, but they're probably still going to be using 'legacy' systems e.g. Tandem NonStop etc.

      Where failures are expected, some people might want to have some idea of how long the system will take to recover, and what kind of state actually counts as "fully recovered".

      What's a plan? Something that shows the layout of the executive washrooms, probably.

    3. Adam 52 Silver badge

      Read the linked blog post from Netflix. AWS and Azure (especially Azure) have geographic redundancy but you have to code for it.

      Since you code it, you know exactly the scope - and limitations - of the redundancy. Cloud isn't a magic bullet, it doesn't solve all your problems for you. Some, like in-region DB replication, it makes much easier (compare configuring Tungsten with RDS read-replicas, for example).

    4. VeryOldFart

      in some circumstances, yes, but in others, no. We use Workday, multi-tenant, SaaS. Because they have have a single line of code, they been able to deliver new features regularly, several times a year. The fact that they have a single code-base must make it significantly easier to develop, test and deploy than either single tenant SaaS or in-house systems. Their security is good and regularly audited by third parties.

      I agree, cloudy stuff can be done badly, thereby introducing extra vulnerabilities, however, if done well ( ) the benefits, to both vendor and customer are significant.

      We had an outage a couple of years ago. Failover to the backup site took four hours and we didn't lose any data - I thought that was impressive as we run HR, Benefits, Payroll, and all financials for multiple companies. Been using it for 4+ years now and consider it to have been a very positive experience.

    5. smartypants

      The Team is what matters, not where the boxes are.

      "It'll be run by the same level of competence as your own in house engineers (take that either way) but without you knowing exactly how the setup is and what security/resilience/redundancy is in place."

      When I use (say) S3 to store my data, I'm using a system maintained by a team responsible for not losing the data of hundreds of thousands of customers and ensuring a continual service. And they have a bloody good track record. How much does it cost me to hire a team large enough to survive the loss of a key person to do the same thing in-house?

      Putting aside the real world and considering a 'perfect' cloud provider that never screwed up, this still isn't going to save your bacon if the architecture you build with their bits or the policies you adopt are poorly thought-through, or... if the key personnel you hire to manage the cloud resources leg it for another job leaving you in the lurch.

      Where the boxes are doesn't really matter so much. The team is still king. That is where your success and failure lie - not in the supposed flaws in cloud services. I can't for the life of me understand this vitriol for cloud services. That isn't the problem with IT.

    6. Roj Blake Silver badge

      "It'll be run by the same level of competence as your own in house engineers (take that either way) but without you knowing exactly how the setup is and what security/resilience/redundancy is in place."

      Unless you do your research, of course.

      And when you do, you'll almost certainly see that a good cloud provider will have far better security, resilience and redundancy than your in-house IT provision.

  3. TimR

    @ Your alien overlord - fear me

    Hear hear!

    But you must appreciate that it gives PHBs someone to blame without any fear of any responsibility lying at their (the PHBs') doors

    1. Anonymous Custard

      And of course someone else to shout at and demand urgent and extensive meetings with.

      You know, those meetings that managers demand with engineers to insist on knowing in minute detail (that they don't actually understand even if provided with) how they're going to fix the problem. Usually extensive and boring ones that take up most of the time that the engineer would actually prefer to be using to fix the issue itself.

      Or in my case (in the role of said engineer) the one that leads to the quote:

      "Do you want me to just talk about fixing it or just actually go and do so?"

    2. Dr Dan Holdsworth

      The downside for the jobbing PHB here is that there is a tendency to want to brand services with the local branding. So, instead of buying in an email supplier for the company Acme Widgets Ltd and simply telling the staff that you've done that, the cloud email supplier is often, even usually branded as Acme Widgets Ltd email.

      So, when it suffers an outage, those people who know that the email is outsourced to cloud will blame the PHB for using unreliable cloud services, and those who don't will simply blame the Acme Widgets Ltd BOFH.

      Either way, whoever is in the BOFH role and whoever is the PHB for Acme Widgets Ltd is going to get it in the neck either for running a crap service, or for choosing the wrong cloud supplier, or (moving higher up the chain) for trying to scrimp and save a few quid and ending up instead costing the company $BIG_BUCKS when the system goes tits up.

      Basically, you can't win in this game. Either the lusers blame you for the solution costing too much, or for it being unreliable.

  4. Anonymous Coward
    Anonymous Coward

    and as I read this...

    ...Facebook appears to have gone a bit titsup. As in offline.

    1. Anonymous Coward
      Anonymous Coward

      Re: and as I read this...

      Yep, service unavailable. If it weren't a Monday it'd boost productivity.

  5. Kev99 Silver badge

    I've been saying since "the cloud" was first foisted onto the unsuspecting that any one who decided to put sensitive, critical, irreplaceable or proprietary data on the web was an idiot and deserved any losses suffered. Remember, it's called the "web" or"net" NOT the vault or safe.

    1. Roj Blake Silver badge

      Newsflash: people who look after their own sensitive, critical, irreplaceable or proprietary data also suffer IT problems.

  6. ICT Consultant

    It is always risk evaluation...

    There are many ways of looking at this...

    The cloud marketing hype of "low cost, security, stability, redundancy, backup" etc. is like any other IT vendors' marketing pitch. They all need to be assessed very analytically and based on this, presented to management on the cost/risk basis...

    Yes it is feasible to provide failover backup to Office 365 - and sometimes it is optimal to have a hybrid approach - providing the efficiency of local resources with the redundancy of cloud solutions...

    As always, knowing what is happening where is critical!

    Ironically we found that out last week when our Office 365 email backup/redundancy service almost came to a grinding halt - because of the AWS outage. This affected the performance of the email function at the client even though the primary service was still fully functional!

    At the end of the day, it is critical to inform management up front - and present them with scenarios. We had another client suffer from a hardware failure at 5:15 on a Thursday afternoon... with 5 x 8 next business day support from Dell. Yup - Dell arrived on the Monday to repair the server!

    As always, management are very keen to save money - they MUST be made aware of the risks of doing so. And unfortunately many of them have this hype-driven perception that if everything is thrown into the cloud it will all work perfectly and everyone will be happy ever after....

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like