back to article Telecity London data centre outage borks VoIP, websites, AWS...

Telecity has suffered a major outage at one of its London data centres this afternoon, which knocked out a whole host of VoIP firms' services, made Amazon wobble and borked its Direct Connect service. A source told The Register that the outage, which happened at around 2pm, knocked out four floors at Telecity's Sovereign House …

  1. Alister

    Telecity refused to comment when The Register phoned them to ask what had happened.

    Oh! You managed to get through then?

  2. Your alien overlord - fear me

    Don't mention you-know-who to Mr Osborne. Nah, it couldn' be them. Could it?

    1. Tom Chiverton 1

      Sssh ! We're trying to stop them torpedoing the UK tech industry by banning encryption, and getting in the way of security research

    2. Daggerchild Silver badge

      Once again, by the time the authorities get there, Corbyn is nowhere to be seen!

    3. Mpeler
      Pint

      Not you-know-who, but you-know-who

      The Register understands both primary and backup power supplies went down, potentially affecting thousands of customers.

      Sounds like BOFH and PFY were out and aboot, not being beer o'clock yet...

  3. Anonymous Coward
    Anonymous Coward

    Looks like Amazon AWS Direct Connect customers are also affected:

    http://status.aws.amazon.com/

    6:47 AM PST We are investigating packet loss between the Direct Connect location at TelecityGroup, London Docklands, and the EU-WEST-1 Region.

    7:36 AM PST We can confirm intermittent packet loss between the Direct Connect location at TelecityGroup, London Docklands, and the EU-WEST-1 Region. An external facility providing Direct Connect connectivity to the EU-WEST-1 Region has experienced power loss. We are working with the service provider to mitigate impact and restore power.

  4. ianx
    Unhappy

    Yep Direct Connect is certainly impacted!

    Also looks like inbound internet access to the EU_WEST-1 region is being hit generally as we're seeing substantially elevated latencies on applications running there.

  5. Linker3000

    Payment processing?

    Could this explain why Paypal is borked for me and the missus had issues with our Santander card in M&S?

    /Yes, we checked our balance.

  6. nsld

    That's one way

    To stop those pesky terrorists using the interwebs, stop putting 50p in the meter!

  7. Valarian

    Not Just Direct Connect

    Links to Amazon VPC infrastructure in EU-WEST-1 were also hit - my phone got very warm between 2-4pm this afternoon with people wanting to know why various services were going dark for minutes at a time...

  8. John Brown (no body) Silver badge

    Amazon?

    Don't they have automatic failover in their elastic cloudy stuff? Or don't they eat their own dogfood?

    1. John Brown (no body) Silver badge

      Re: Amazon?

      Why the downvote? Am I wrong in thinking that various "cloud" providers, including Amazon, sell their services based on automatic fail over even if an entire data centre goes off-line?

  9. DaleWV

    Now I can understand an undersea cable being a single point failure with possibly very wide spread implications but if a power failure on a few floors of a single building can have such wide spread implications then don't we have a deeper problem?

    Fortunately for us this only really impacted our test & dev environments and, ironically, our DR capability.

    1. Anonymous Coward
      Anonymous Coward

      Fortunately for us this only really impacted our test & dev environments and, ironically, our DR capability.

      Don't forget the social impacts which were more serious: I noticed that grumble feeds were intermittent and slow last night, and I'd therefore like to ask Telecity to investigate their backup power provision to prevent this happening again.

  10. Anonymous Coward
    Anonymous Coward

    Terrorism and 'won't somebody think of the children'

    This is what happens when you let the spooks splice into the networks.

    Normally they blame it on submarines breaking under water cables. This time they fucked it right up.

  11. Anonymous Coward
    Anonymous Coward

    Aws direct connect affected

    Yep all our production Amazon web services were down for 5 hours. Nobody could access their desktops or emails. Where is the resiliency?!? Another knock on the cloud hype train. Never would have happened had we left things how they were in the datacentre

    1. smartypants

      Magical non cloud it strikes again

      Honestly I wish I had a pound for every mention of the mythical non-cloud IT. You know, the one that can never go wrong...

      1. Anonymous Coward
        Anonymous Coward

        Re: Magical non cloud it strikes again

        It can go wrong, but 5 hours for a major [layer?

    2. Smoking Gun

      Re: Aws direct connect affected

      Just because you chose to put your production environment in a cloud doesn't mean you get resilience by default right? Do you not have to architect your cloud environment like you would your own physical environment by selecting resilience datacentres for your production workloads?

      Like in Azure you might tick the geo redundant feature or ensure your backups, backup to something not in Azure for DR purposes. If a business jumps on the cloud train and works on the basis that it all "just works" when things go tits up, surely need to think again?

  12. Anonymous Coward
    Anonymous Coward

    Back up often

    Both UPS channels went offline in a cascade failure due to loading. Then the transfer to mains was disruptive and the transfer back to UPS failed.

    And the second attempt to switch back has also failed and that involved switching it off and on again so it was a proper IT fix, not some bodge.

    Currently it's running on utility power. It's not the first time the UPS systems at Sovereign House have gone out like this either...

    1. John Stoffel

      Re: Back up often

      This is one of those areas where you'd think they would be running tests on the system on a regular basis, with N+1 redundancy. If you don't test, you don't know.

      But I've run into stuff like this before where we had a dodgy transfer switch that if you let it sit for a month or two, one of the phases would stick and not flip over the next time you had an outage. But once you tested it... it would be happy as a clam and would switch back and forth no problem. It took me doing monthly tests and metering the panel to finally find and prove the problem. Took over a year to find and solve this issue.

      Ever since then... I test and check the voltages on the transfer switch.

      Now in this case, if the loads are too high... then I suspect someone goofed and overloaded a singel phase or something so that too much power is being pulled from one leg at a time, which doesn't allow things to come up cleanly. Not a fun situation if you haven't planned for it and know how to shed load (i.e. turn off crap...) as you bring things up to let disks spin up one by one, instead of a huge thundering herd.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like