back to article Cloudflare engineer broke rules – and a customer's website – with traffic throttle

Cloudflare has admitted that one of its engineers stepped beyond the bounds of its policies and throttled traffic to a customer's website. The internet-grooming outfit has 'fessed up to the incident and explained it started on February 2 when a network engineer "received an alert for a congesting interface" between an Equinix …

  1. Kevin McMurtrie Silver badge

    It has to be

    It's DNS cryptocurrency. It's always DNS cryptocurrency.

  2. Gene Cash Silver badge

    Needs AI and blockchain

    That way it'll go from a simple mistake to an absolute flaming trash fire.

  3. Anonymous Coward
    Anonymous Coward

    Enough with the conspiracy theories...

    In the days before Cloudflare existed - indeed, very nearly the days before services like Cloudflare's existed, being as it was almost exacly the time Prolexic got off the ground - my site was DDoSed by an organised crime gang, and our ISP's 'help' was black-holing us entirely, on the grounds that the DDoS against us was taking down all the other clients in the datacentre. To be honest, I understood their decision then and I'd understand it again now.

    So while on the one hand, obviously it's bad that Cloudflare did this, all the "oh noes, I had no idea a network provider could do such a thing!" hand wringing is a nonsense. At the end of the day, what else do you expect to happen? All the other customers to salute and go down on the ship with you?

    1. rg287 Silver badge

      Re: Enough with the conspiracy theories...

      Enough with the conspiracy theories...

      Hear hear. Quite unhelpful that the article makes this curious statement:

      Actually throttling a customer without warning will likely fuel theories that Cloudflare, like its Big Tech peers, is an activist organization that does not treat all types of speech fairly.

      I mean, I suppose some corner of Q or Parler will probably have a whinge, but since they weren't actually suppressing a political outlet, or in any way treating speech (nor suppressing traffic on any basis other than sheer volume), it's not a useful data-point in the general discussion of "Is too much of the internet going through CF, are they becoming a monopoly provider, is it a bad thing that the Venn diagram of "the internet" and "cloudflare" have been trending towards a circle for a while now (they'll never get there entirely of course, but there's perhaps more overlap than is healthy in a supposedly diverse and distributed network).

      my site was DDoSed by an organised crime gang, and our ISP's 'help' was black-holing us entirely, on the grounds that the DDoS against us was taking down all the other clients in the datacentre. To be honest, I understood their decision then and I'd understand it again now.

      I agree to a point... but your datacentre will have had some sort of fair use policy and you were presumably billed on transit, or some portion of your hosting was related to data - whether port speed or data usage.

      Cloudflare has a policy too... but you literally sign up with CF to avoid things like DDoS. That's the package (and the entire point) - CDN, DDoS-protection and all-you-can-eat bandwidth (notwithstanding the explicit exceptions for images/video if someone tries to build the next Flickr/Imgur/YouTube behind it). If CF struggles with capacity, that's rather their problem - not the customer's. Of course it seems to have wrong-footed them that this was huge amounts of legitimate traffic with large requests - not a DDoS that they could kill at the edge, which rather stressed their individual link to that DC. But it's still their problem to solve.

      All that being said, 3000requests/sec at 12MB per request comes to... 288Gb/sec, and $200/mo doesn't buy you a 10Gb port on most exchanges, so they were getting their money's worth whilst it was spiking!

    2. Jason Bloomberg Silver badge

      Re: Enough with the conspiracy theories...

      To be honest, I understood their decision then and I'd understand it again now.

      If I were Cloudfare and had a customer effectively DDoSing the service I'd probably pull the plug on them until the issue is sorted; the needs of the many outweigh the needs of the few, or something like that.

      Where things seem to have fallen below expectations is in liaising with the customer and Cloudfare having perhaps not known what had been done to mitigate the issue until the customer started bitching.

      So, all-in, a case of the engineer having done the right thing, but not having checked with the higher-ups that it's the right thing to do first.

  4. Victor Ludorum
    Joke

    Poor engineer

    The post does not mention what, if anything, happened to the engineer who applied the throttle.

    He was throttled?

    1. spireite Silver badge
      Joke

      Re: Poor engineer

      Accelerated his departure

      1. alisonken1
        Joke

        Re: Poor engineer

        Promoted to Customer!

  5. Sgt_Oddball Silver badge
    Headmaster

    They've at least done the right thing...

    Own the issue, admitted it was human error due to procedural omissions and stated they'll make changes to remedy it from happening again in the same manner. They haven't said the fix was completely wrong, but that how it was applied, was.

    If anything this actually gives some confidence in them and right now seeing the engineer punished wouldn't be worthwhile. If anything I'd say said engineer is now more qualified than any other to address similar incidents going forwards.

    All in all, well done Cloudflare for putting this out there.

    Teacher icon because every day is a school day.

    1. Martin Summers Silver badge

      Re: They've at least done the right thing...

      It wasn't human error though. They had no policy, and the engineer acted in good faith believing they were doing the right thing in the absence of one.

      1. Nick Ryan Silver badge

        Re: They've at least done the right thing...

        The human error was in not having a policy

        1. Juillen 1

          Re: They've at least done the right thing...

          Actually, I'd say it's just a learning experience. It's an example of the Frame Problem; you can think of the majority of things you need to know to stop something stopping dead in its tracks, but there's always the unexpected. If you decide you won't do anything until you know every edge case and have a solution to everything, you'll never actually get round to doing anything other than searching for ever more unlikely edge cases to evaluate.

          Having to make a split second decision in the heat of the moment does not give you the benefit of hindsight, or that extended period of deliberation in calm, amongst many minds.

          1. Nick Ryan Silver badge

            Re: They've at least done the right thing...

            True. I suspect they have a policy now though

          2. logicalextreme Silver badge

            Re: They've at least done the right thing...

            Aye — and the crucial thing is to make sure that when the unexpected (or even the mildly-expected-but-not-deemed-likely-enough-to-be-worth-coding-for) happens (or happens often enough), that you have processes, architecture, people and a codebase that can be adapted to deal with it using a reasonable amount of time and effort.

            I've seen it happen all too often that a line is drawn (perfectly reasonably) under the number of cases that have been programmatically anticipated in order to avoid endless searching for edge cases without ever doing an actual release, but then adding in a new case (edge or otherwise) is so difficult due to rigid processes etc. that it never gets done.

          3. John Brown (no body) Silver badge

            Re: They've at least done the right thing...

            "but there's always the unexpected."

            True. But then there's the need to differentiate between known unknowns and unknown unknowns which will need different mitigations :-)

        2. JoeCool Bronze badge

          Re: They've at least done the right thing...

          Or possibly training : "we have many ways of shaping traffic". But that's still human error.

  6. Anonymous Coward
    Anonymous Coward

    mmm

    I expect the engineer was being screamed at by upper management to fix it, so he did.

    real story would be management being too scared to tell customer the reason for blockage.

    1. tracker1

      Re: mmm

      CF has a pretty good reputation as an engineering first culture. TBH, the only "error" was not notifying the customer.

      Guessing the escalation process being added will address that.

  7. Down not across

    "Cloudflare engineer broke rules"

    Tut. The engineer couldn't have broken a nonexistent rule.

  8. Anonymous Coward
    Anonymous Coward

    All the cloud providers do crap like this. They say that they have established processes and procedures, but then in any perceived crunch, they go outside of their own established rules that the customers have been told to expect, but then their environment comes to a screeching halt because they did something that was outside of the expected error handling.

  9. ecofeco Silver badge

    Yeah but....

    What WAS the actual cause? What CAUSED the traffic spike?

    1. cookieMonster Silver badge
      Joke

      Re: Yeah but....

      Oh that, that was nothing to worry about. It was just a little script to check that your office install was not EOL

    2. Roland6 Silver badge

      Re: Yeah but....

      Well if I have read the article correctly, I suspect the customer initiated a restore from their cloud backup/archive provider.

      I suspect storage providers typically have large amounts of inbound traffic, but infrequent requests for bulk exports and thus large and prolonged amounts of outbound traffic.

  10. clickbg

    Well I mean at least they pledged to improve the process. Yes CF is a corporation and as a corporation has no sense of morality, but I truly believe that the people working there are mostly trying to improve the world. So kudos to them for improving. I just hope they don’t fire the engineers, prior to this there apparently wasn’t a clear procedure so you can’t fault an employee for not following a rule that doesn’t exist.

  11. Little Mouse Silver badge

    "Blame Free Culture"

    That's a phrase that gets misused too often by companies. Hopefully not in this case though.

    I've only worked at one place that officially had a Blame Free Culture, and plenty of others that just naturally didn't feel the need to blame individuals for faults and errors unless it was genuinely deserved.

    The place with the official policy used to go to a lot of effort to identify exactly who it was that we shouldn't all blame, and made sure that everyone new who it was that wasn't getting blamed. They even held high-level meetings to discuss the individuals who weren't being blamed, because, officially, they were so caring and people-focused.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like