back to article O2 kicks out Ericsson server for breaking its network

Ericsson's Centralized User Database has been fingered by O2 for a second network outage which hit the operator last week, and will thus be given the boot despite the £10m cost of a replacement. Last week's outage wasn't as serious as the 21-hour downtime which hit O2 customers in July, but it was down to the same bit of kit …

COMMENTS

This topic is closed for new posts.
  1. Mike Brown

    "We are not prepared to risk this happening to our customers for a third time,"

    But they were for a 2nd time? Thats top quality customer service right there.

    1. Anonymous Coward
      Anonymous Coward

      O2 were relying on a 3rd party 'bit of kit' that went wrong once (acceptable).....2nd time (*looks for ways for it to never happen again*)....3rd time (*kicks old kit out and uses 'proven alternative solution'*).

    2. Anonymous Coward
      Anonymous Coward

      So...

      ...you'd splash out £10million on new kit on a single outage. Please, do NOT ever work for us.

      1. Mike Brown

        Re: So...

        Yes as it was broken. Clearly it was, and they either hoped it would work, or didnt actually notice it was broken.

      2. A Non e-mouse Silver badge

        Re: So...

        ...you'd splash out £10million on new kit on a single outage. Please, do NOT ever work for us.

        I think you need to put this into perspective.

        Their annual upgrade & expansion costs are in the region of £550M. Spending £10M on one piece of kit that has caused two big outages (and maybe causing internal headaches) plus a massive dent in customer (and investor) confidence is a reasonable business decision.

        1. Annihilator
          Headmaster

          Re: So...

          "Their annual upgrade & expansion costs are in the region of £550M. Spending £10M on one piece of kit that has caused two big outages (and maybe causing internal headaches) plus a massive dent in customer (and investor) confidence is a reasonable business decision."

          Yes, now it's a reasonable decision - but the OP was suggesting replacing it after ONE incident.

          Also, be under no illusions this is one piece of kit - it'll be one system.

      3. FordPrefect

        Re: So...

        When that "bit of kit" cuts off 10% of your customer base(Thats gotta be what 100k people ?) for 24 hrs or so a single spend of £10 million looks essential to me. If people get cut off for a 3rd time for an extended period customers especially business customers will start to jump ship. Its not like o2 are any cheaper than the other mobile companies...

        1. Anonymous Coward
          Anonymous Coward

          Re: a single spend of £10 million looks essential to me

          please do NOT work for that guy up in the comments.

          Though, as he's anonymous, you'll only know if you are when their service goes titsup and they all look at it and say, weeell, it's only happened once. Let's see if it happens again

      4. JetSetJim
        Stop

        Re: So...

        > ...you'd splash out £10million on new kit on a single outage. Please, do NOT ever work for us.

        Quite conceivably it won't cost them a penny - I expect a contract for supply of this shiny box comes with KPI definitions and hefty penalties for missing those KPIs. I imagine at least 5 9's uptime is one of the contractual clauses so O2 are probably within their rights for withholding payments.

        Similarly, Ericsson will be on damage limitation so will be keen to placate their customer - particularly one spending a million quid a day on kit that presumably Ericsson sell them (maybe not all of it, but probably around 50%, I'd guess).

      5. Anonymous Coward
        Thumb Down

        Re: So...

        "...you'd splash out £10million on new kit on a single outage. Please, do NOT ever work for us."

        Get real. This was a huge public embarassment, and a failure in their core business of supplying reliable communication services. Not "people couldn't access their bills online", not a remote mast failure cutting off two crofters and their dog, but core service failures affecting millions of people. For the water industry this would be a taps not working moment, for the electricity industry it would be lights out, for car makers it would be recall time, Toyota style.

        Even if it had so far been a single instance affecting 7m customers, O2 need to spend whatever it costs to make sure it doesn't happen again, with an upper expenditure limit probably approaching £100m maybe higher.

        How come £100m? With 7m customers officially affected in July, that's what, 1.3m contracts ending in the next six months who have been affected? If a typical contract has £7 a month gross profit (I think it's more like £12, but I'll work on the cautious side), and they lose an incremental 10% of those 1.3m affected, then that's lost income of around £11m a year, £22m on typical two year contracts. And that's assuming that 90% of those affected decide to stay with O2, and those with more than six months on contract forget about it at reneal time. And if they chose not to fix it at my up-to-£100m, what happens next time? Another 7m punters persuaded that O2 can't be trusted? Another £22m of income lost, making for a £44m of lost income, and still with a dodgy system waiting to do the same again?

        Certainly they need to differentiate between one offs and systematic failures, but business always claims any failure is a one off. People in O2 must have known that the July mess up had a high likelihood of repeating itself, but somebody like you decided that £10m was too much for a one off, and look where it has got them.

    3. Anonymous Coward
      Anonymous Coward

      And yet...

      ... by replacing the Ericsson kit surely they risk this happening to their customers for a third time if the replacement process fails.

      In other words, it assumes the act of replacement is risk free too.

  2. Shane8
    Go

    "Confidence is key here. Customers will pay more for reliable connectivity and will avoid any company they don't believe capable of delivering it."

    Thats why i'll never use 3 again!

    1. leexgx

      three main issue is not placing new masts to fill dead spots in and out doors (less then 10% signal) but that could be more an issue with not having much bandwidth compared to all of the other networks and they use 2100 band as well, so walls Really love three network (i am sitting between 3 t-mobile/three masts problem is 10%< to no signal in doors so it tends to stall the radio on the phone a lot as its switching between the masts, been like that for 5 years)

      i cant fault there customer support thought (i know more then most customer service reps do normally, Three they are basically at my level when talking to them so its nice to bad network coverage is not) only issue is when you want to leave that is, it take you an Hour to get connected and they try and keep you by lowering contract price (at least the call is free)

  3. EddieD
    WTF?

    Kudos for honesty.

    I'm still pissed off, but I'll give them credit (slightly less because they're saying "it wasnae us...") for stating publicly what the error was, and detailling the steps they'll be taking.

    And a really big "WTF" for not putting a failover solution in place after the July outage that did cause me problems.

  4. Steve 53

    But...

    "CUDB node is based on a Distributed Cluster Architecture which guarantees high capacity with an optimal footprint and real time availability.", it says so right on the website!

    1. Anonymous Coward
      Anonymous Coward

      Re: But...

      I read the website as well, but for some reason rather than what is says:-

      "CUDB provides a single point of access and administration..."

      I read it as

      "CUDB provides a single point of failure..."

    2. Alan Brown Silver badge

      Re: But...

      It wouldn't be the first time that a High Availability layer has proved to be the least reliable part of the whole setup.

  5. zaax

    I'd want 110% of the cost back so I can give a bribe to all my customers so they don't go.

  6. Anonymous Coward
    Anonymous Coward

    maybe?

    first outage, 50% of people affected, 2nd outage, 10% of people affected, possibly they've been migrating people since the first outage, but weren't finished.

  7. codejunky Silver badge

    I am impressed. O2 suffered some failures which is bound to have damaged their reputation. This brute force honesty of accepting the failure, identifying the failure and stating they are changing out the problem with a proven solution should not only restore their reputation but I see them as better than I did before.

    It would be interesting to see a survey of how much confidence people have in their network has been hit and if their reputation is seen as better or worse after disclosing the facts and attempting to fix the problem.

    Even if their new solution hits some bumps I will have far more respect and loyalty to them if they continue to be factual about any issues.

    1. ahd
      WTF?

      IFFFFFFF they were factual! Although o2 claim they had the problem fixed after 24hours the reality was that users were left struggling well beyond 48 hours and in many cases 72 hours. As a user it was immaterial to me that the failed kit was replaced after 24hours a problem is not fixed until the last user is back to full functionality. The HONEST approach was to recognise the length of time users struggled rather than to tell users all was well to avoid paying compensation.

  8. Leebeejeebee
    Flame

    Funny that...

    ... I have a Sony Ericsson phone (on 3 network) that seldom has a signal, and when it does it's very weak. It also crashes regularly (probably due to lack of signal) and drops calls, even when it does have a signal. I'm in dispute with 3 at the moment about it has I have had one replacement and two repairs, and the damn thing still won't work properly. I will never again get a Sony Ericsson, and will never again get a contract phone on 3.

    1. fixit_f
      Thumb Up

      Re: Funny that...

      My other half has massive problems with three as well - they probably have the worst reputation of all the networks I'd say.

    2. garden-snail

      @Leebeejeebee Re: Funny that...

      "I will never again get a Sony Ericsson..."

      Nor will anyone else:

      http://www.theregister.co.uk/2012/02/16/sony_ericsson_divorce_final/

    3. Anonymous Coward
      Anonymous Coward

      Re: Funny that...

      I've had Ericsson and Sony Ericsson phones, and also dealt with Ericsson on a professional basis. The phones were the most reliable of any I've owned, and among Scandinavian telco suppliers they're far and away the nicest and most competent ones to work with. I do hope that this blows over for them quickly.

    4. Anonymous Coward
      Anonymous Coward

      Re: Funny that...

      You mention crashes!! does it happen to run Android! :)

  9. wabbit02

    "Huawei already has the O2 contract for next-generation kit"

    Have O2 UK announced their LTE (I assume that is what is meant by "next gen") core or access vendor? I know they have done trials and O2 DE has gone Huawei. But the UK is a big Ericsson shop (an rightly so its good kit).

  10. banjomike
    WTF?

    Their performance will now be measured on customers’ confidence in our network

    They are SOOOOOO dead!

  11. John Benson
    IT Angle

    Does this mean that the Erlang language doesn't work well?

    Or is it just bad application coding?

  12. Anonymous Coward
    Anonymous Coward

    In my experience O2 internationally has mammothly under-invested in DB admin tools, ignored the few good DBA's that have fleetingly passed through and been driven by the needs of their service suppliers, which in some countries are diametrically opposed to the aims of O2 itself.

    The CUDB should be installed on a distributed cluster. How many O2 folk understand that level of technology?

    It doesn't matter what software, hardware or DB architecture is used, a budget installation will always be prone to failure. Most of us can recall HLR failures (through poor design and planning) over the last 12 years across several of the O2 /Telefonica operators. Real life testing of failover should be a regular procedure.

    Just a bunch of muppets, really.

    I'm posting anonymously for obvious reasons.

    1. Anonymous Coward
      Anonymous Coward

      well

      They spent few million on the project, do you think they have some basic hardware? I doubt it, this is a just a question of bad software/ hardware, wrong strategy, wrong product, salesmen who overpromise to make their money!! Totally useless.

  13. Anonymous Coward
    Anonymous Coward

    The Consequences of Failure

    ... of an HLR/(whatever they're calling them this week) are so fundamental that networks are rightly scared of messing with them.

    At Vodafone they are still entirely in house kit, running on OpenVMS and then Linux - and with so much network specific customisation over their twenty year lifespan that the supposed 12 month replacement project by Alcatel-Lucent is now approaching its fifth year.

    For O2 to let this happen twice is pretty inexcusable, although as a Giffgaff subscriber I was completely unaffected by virtue of the pretty big coverage areas on modern 3G MSCs. Stay put and you'll be OK.

  14. Christian Berger

    One of the flaws of GSM-ish networks

    They were never designed for reliability. That's why they have single points of failure like the HLR (which apparently failed here). There were other network architectures which did have different approaches. For example AMPS (in the US) always accepted the first call from a new phone. It then looked up the identity and wouldn't allow a second call if you were unknown. The German B-Netz in Germany only punched call data onto punchcard, making no real-time verification of the user at all. The German A-Netz had an operator which called you back if you wanted to make outgoing calls.

    It's mind boggling to see how much one could save in complexity and cost, if they wouldn't have to bill you for the service. You could use the modern RF interfaces of LTE and simply run Ethernet over it. No MSC, no HLR no VLR and so on.

  15. Dug Stokes

    Can someone explain this one?

    "...it was not the same fault that cut off millions of customers in July."

    http://www.bbc.co.uk/news/technology-19928507

    "We are not prepared to risk this happening to our customers for a third time,"

    http://www.theregister.co.uk/2012/10/17/o2_database/

    That sounds like an out-and-out lie to me.

    1. Anonymous Coward
      Anonymous Coward

      O2 aren't renowned for being entirely truthful.

      You only had to look at the O2 and giffgaff forums, plus Facebook, to see that there's a good chance that more than the stated 10% of customers were experiencing issues. Plus, they claimed that the outage started at lunchtime except that in reality it started to slide long before then.

    2. Anonymous Coward
      Anonymous Coward

      Why is that a lie? I read it as 'we are not prepared to risk an outage happening to our customers for a third time'. The outages are the event - the root causes could be different.

  16. M 6
    Thumb Up

    "...you'd splash out £10million on new kit on a single outage. Please, do NOT ever work for us."

    Please don't ever apply for a job at a decent IT company as you clearly can't read (there was more than one outage) and you don't understand the scale of the problem.

    Huawei - Maybe they caused the outage to get their kit into O2! ;)

  17. TxRx
    Holmes

    Hang on... what 'actually' happened with the Ericsson kit...

    and how come post down time anyone on O2 and Tesco I know are now being directly hammered with spurious sales calls? I smell a breach...

This topic is closed for new posts.

Other stories you might like