back to article eBPF. It doesn't stand for anything. But it might mean bank

Meta says it has managed to reduce the CPU cycles of its top services by 20 percent through its Strobelight profiling orchestration suite, which relies on the open source eBPF project. Fans of the mass audience monetization biz may be delighted to learn that this translates to a 10 to 20 percent reduction in the number of …

  1. An_Old_Dog Silver badge

    Cost Savings

    ... and then Meta passed some of the savings on to its customers and to the employee which actually did the work.

    Meta Employee A: "Why is there a doughnut on my desk?"

    Meta Employee B: "It's a symbol of employee appreciation from the Big Boss. Also, there's Post-It note on your monitor."

    Employee A (reading the note): "I wanted to take a moment to personally thank you for the work you did which will save this company millions of dollars each year. Keep up the great work! That's an order.

    Also: to streamline our operations, your position has been cut as of the end of this month.

    --Z."

    1. Andy 68

      Re: Cost Savings

      Oh come on, they're not that bad.

      He'd at least get a waffle party, surely?

      1. Doctor Syntax Silver badge

        Re: Cost Savings

        Waffle party?

        Is that an address to the staff by senior manglement?

        1. collinsl Silver badge

          Re: Cost Savings

          That's a hot air balloon ride

      2. l8gravely

        Re: Cost Savings

        He was sent on an ORTBO...

    2. Anonymous Coward
      Anonymous Coward

      Re: Cost Savings

      Devil's Advocate:

      Do the golden geese keep laying eggs? Or are these events usually the result of a one-trick pony? Do developers that identify massive saves like this typically go on to deliver massive saves in the future, or do they generally "get lucky" that one time? Do other massive saves come from other employees profiling different code? Does this developer get moved to a different project to massive-save again? Was this just a qwirk of the time/position?

      How much of this was profiling (and who suggested/justified/ordered the profiling?) as opposed to software development prowess?

      Meta, Microsoft, Amazon, Google et al. probably have the statistics on these sorts of things, but I've never seen it discussed generally.

      All of this ignores the simple fact: who missed the optimization potential at the beginning, and how much pressure were they under to "get it done, not get it done well"?

      1. that one in the corner Silver badge

        Re: Cost Savings

        > who missed the optimization potential at the beginning

        Everyone. And rightly so.

        The order is: make it work; make it work correctly; make it work fast.

        If you *think* you see an opportunity for optimisation in the first two steps, just make a note of it and carry on. As soon as it is working 'correctly enough" to be functional, run it through the profiler. Then, and only then, locate the hot spots - both time and memory consumption, often intertwined- and improve those. Re-profile to find the next hot spots.

        Otherwise, without data from the profiling, you will inevitably waste time on premature optimisation: the "slow code" you think you spotted may only run once a month - or it may even be excised completely by an algorithm fix whilst "making it work correctly"!

        > (and who suggested/justified/ordered the profiling?)

        You should always have in your plan the expectation of profiling (although I am horrified how often I've worked with coders who are "expert" with their IDEs and do not even know if they *have* a profiler or are worried when the answer is "no". It was shocking whem Microsoft stopped providing a profiler and nobody seemed to care!). Only drop - or postpone - that once you can see the program working and somebody properly signs off as "it is demonstrated to be running well enough".

        So the question should be: "Who *stopped* the profiling? What was their justification and was it valid at the time?"

        Lots of times, it is really easy to see that profiling and optimisation are not necessary - because (numerically) lots of time you are involved in only a short program, a one-off tool, probably just flung together, run and forgotten about. Great. But even then, good practice is to write down "I did not do this stage, it wasn't necessary/useful, signed Burt".

        > how much pressure were they under to "get it done, not get it done well"?

        That is what causes the profiling to be dropped - and that fact signed off by the PM and manager of course (ha, ha).

        BUT even after finding this saving of x million pounds and k servers no longer needed, saying "Why was this allowed to happen? Why didn't we find this out earlier? Ah ha, it was Fred's fault, he told them not to profile; sack him now" that is very possibly an over-reaction:

        How much growth has there been in the use of the code since it was written? Is it now processing (k * 10) or more transactions compared to when it went into production? If do, and (being very, very crude) you can now shutdown k whole servers, at the beginning you could have only saved 1/10th of a server. Less easy to shut that down!

        In other words, as time goes by, IF you experience growth, the balance of costs to fix versus savings made will alter.

        As to "Can the same person pull off the same trick again? Was he the only one who could?" - highly likely those are pointless questions, because if your system has reached the point where one (lowly) person could ever be in the position to randomly say "I wonder what happens if I profile this?" then what you have found is a management and planning failure: you've grown but haven't bothered to think about what growth does to a system. You haven't bothered to go back and check how many profile runs were just not done etc. If you pay a bonus to the engineer - and you should - you should take it out of the manager' bonuses (yeah, I know, that'll never happen).

        Reminder: the above is different from the situation that possibly more of us have been in, where everyone knows full well the program must get a speed-up, we even have all the customer complaints that it isn't keeping up. But nobody can figure out how to improve it, until Jim has his brilliant moment.

        1. An_Old_Dog Silver badge

          Re: Cost Savings

          In "Program Style, Design, Efficiency, Debugging, and Testing" (Dennie Van Tessel, 1974), the point was made that one should not optimize before profiling one's code, as one nearly-always guesses incorrectly where the hot spots are.

          "The Elements of Programming Style" (Brian W. Kernighan and P. J. Plauger, 1974) made the same point.

          Fifty years later, it still holds true.

          1. Bebu sa Ware
            Windows

            Re: Cost Savings

            Knuth's "premature optimisation is the root of all evil" also dates from 1974. Must have been the Zeitgeist then.

            1. Anonymous Coward
              Anonymous Coward

              Re: Cost Savings

              I would posit that this is the time of assembly optimization and weird bit-ops to try and get micro-optimizations. (id Software's square-root optimizations, for example.)

              Other optimizations, like design, datastructure usage, and general consideration to the problem at hand perhaps fall into a different bucket.

              Part of the difference is: by not doing any optimization, especially because things are not in the "hot path", with pointless loops and slowness "that only happen once, so who cares", lead to Windows applications taking *minutes* to start up. That was really only fixed with the advent of SSD's. (Even if it is fixed by something that just goes faster, is it wasted energy? - multiplied by how many global users?)

              In Meta's case, how many developer-*years* would have been paid for had some consideration been given to design before deployment? Perhaps even just proper use of "const" and such. Code review to identify easy-wins, escpecially in design (as opposed to blindly thumbs-up).

              They say that it costs peanuts to fix an error when it's main, a little bit to fix an error if it's found in QA, and a WHOLE LOT to fix an error in production. Consider this optimization error an _error_, and it has a real cost. Design, implement, test, optimize, test, deploy? It seems markedly difficult to refactor for performance once something is made, as opposed to when it's designed. You're limited in what you can change - so decide only trivial micro-ops are possible, because you can't do that, or that, or that, ...

        2. Anonymous Coward
          Anonymous Coward

          Re: Cost Savings

          > If you *think* you see an opportunity for optimisation in the first two steps, just make a note of it and carry on. As soon as it is working 'correctly enough" to be functional, run it through the profiler. Then, and only then, locate the hot spots - both time and memory consumption, often intertwined- and improve those. Re-profile to find the next hot spots.

          I really dislike people that think like this.

          Just because you do something doesn't mean that you should do it *badly*. If you need to access one piece of data by name, you shouldn't use an array - you should use a hash. Yet, you're basically saying the opposite: if you think of using a hash, you should use an array, anyway. Too often I see people seeming to do exactly that.

          Part of doing it correctly, _even the first time_, is using proper (datastructure) tools for the job. It would only pass my PR review if it were an utterly inconsequential number of items (regardless of scale). An extreme example, but an unacceptable comment.

          1. Vincent Manis

            Re: Cost Savings

            It's instructive to read the original Ken Thompson Unix 6ed kernel (in the Lions commentary). It's full of fixed-size arrays, linear searches, and obscure code, complete with the famous comment “You are not expected to understand this”. I am sure that if he'd had to go through elaborate PR vetting, Unix would have ended up on the dustheap of history.

            As for “using proper (datastructure) tools”, I taught data structures at the university level for 10 years. Making a copy of a datum, rather than just using a reference, can make dangling pointer and unexpected mutation errors go away. It's defensive programming at its best. In this case, the engineer found that the copy was unnecessary, and perhaps that could have been determined when the original code was written...but at least this story isn't about using eBPF to find why code was smashing memory.

          2. Anonymous Coward
            Anonymous Coward

            Re: Cost Savings

            Ferris Beuller, you're my hero!

          3. Bent Metal

            Re: Cost Savings

            >Just because you do something doesn't mean that you should do it *badly* ...

            When I read the post this is referring to, I didn't take any suggestion to do it badly. I suspect this may relate to the eternal conundrum of "what is design vs. what is optimization".

            Most definitely you should take the easy wins, build a sensible first cut of the system - and not a deliberately poor one. But don't become lost in the weeds, spending too much of your time optimizing the living daylights out of one random particular feature. Because if you're not improving the actual bottleneck, you're focusing your time in the wrong area.

            And that's far from optimal.

        3. Doctor Syntax Silver badge

          Re: Cost Savings

          The order is: make it work; make it work correctly; make it work fast.

          If you *think* you see an opportunity for optimisation in the first two steps, just make a note of it and carry on. As soon as it is working 'correctly enough" to be functional, run it through the profiler. Then, and only then, locate the hot spots - both time and memory consumption, often intertwined- and improve those. Re-profile to find the next hot spots.

          And there's the net through which other issues might slip - if it looks good enough it probably passes "correctly" and we end up with stuff that doesn't scale. I remember an incident that resulted in a weekly job starting to fall over as the business grew. Fortunately the application code was available. It took an hour or so to go through it to discover that in a deeply nested loop the application was causing the database engine to allocate memory that was never freed, at least until the application exited which it did when the engine crashed which it did when the OS refused to give it more space. It probably looked to be working "correctly", whatever that might mean, but it didn't run at scale.

          OK, a programming error causing a memory leak isn't new although causing a leak in a different process is a little more exotic. But aren't free after use (and use after free) best dealt with by getting it right in the first place rather than hoping to pick them up in a second pass?

      2. An_Old_Dog Silver badge

        Re: Cost Savings

        1. That (figurative) bubble sort which a programmer threw in to "just make it work" might have given sufficient performance with the short lists and relatively-few times it was called when the program was first released, but became a tractor-brake with longer lists which in turn were brought about by modifications, refactoring, and upscaling.

        2. This person was not necessarily a coding genius, but a programmer, in the right place, at the right time, who saw the relevant code and said to himself, "Hey, wait a minute ..." For companies to increase the number of such lightning-strikes, they have to invest in programmer training -- training in how to design and program, not just, "learn this trendy programming language", or, "learn this IDE".

        Most companies have not been doing that how-to training.

      3. Bebu sa Ware
        Coat

        Re: Cost Savings

        I would suggest that meta developers leave the ampersands off until they might need the references.

        The Rustards would likely point out that would not have occurred with Rust which is probably true - gladly concede the point to them.

  2. HuBo Silver badge
    Gimp

    Remarkable stuff!

    Yeah, Roger Booth had that as a C++ Gotcha on Medium last year, with "'auto' and the &" examplified by (the 1-char difference):

    auto first_big = get_first(big_vec); // does possibly needless copy

    auto& first_big = get_first(big_vec); // doesn't do needless copy

    He suggested that C++17's copy elision might help too.

    Good thing though that Meta's "seasoned performance engineer" protagonist was profiling to the eBPF Strobelight, rather than, say, checking out chicks/dudes while doing The Safety Dance, pogo, or suchlikes!

    P.S. cool link under: "a number of arguable significance"

    1. Anonymous Coward
      Anonymous Coward

      Re: Remarkable stuff!

      Interesting, I hadn't thought that it could be the return value.

      I'm only "used" to seeing objects passed to functions, and so I was expecting:

      `myfunc(myclass abc) {...}`

      `myfunc(myclass& abc) {...}`

      but what you showed is equally applicable. Perhaps there's an opportunity for Meta to get twice the savings! (I wonder if they'll offer you a job to save them money again and again! Or maybe hire you to type the character and then lay you off for further cost-savings..)

  3. Anonymous Coward
    Anonymous Coward

    Ampersand

    So, that'll be making a function take a ref instead of passing the a copy of the object.

    Yup, know that one. The MFC code that came with the Microsoft VC6 compiler seemed to be written by someone totally unaware of passing by ref - let alone a const ref. Programs that did a lot of work with for example time series data, passing around CTime instances, could be spend up ten or a hundredfold just by editing the MFC sources, putting in lots and lots of ampersands and consts and recompiling it. That drove the point home.

    In later years, away from MFC, could still always guarantee to save time & memory by doing the same to code written for us, to save copying some complicated type built out of Boost/STL where they did not know the size of the structures and cost of copying. Worse, never profiled to find out. And you don't need to resort to eBPF to profile most programs.

    And, no, this does not prove that C++ is bad, many languages provide both call-by-name (ref or const ref) and call-by-value (make a copy you can modify without affecting the caller) and the same mistakes are possible in all of them.

  4. Dan 55 Silver badge

    "expensive array copies that happen unintentionally with the 'auto' keyword"

    The same would happen if the variable were defined as a vector too, wouldn't it? It seems to me that the type makes no difference, someone just forgot to specify that it is a reference.

    1. that one in the corner Silver badge

      Re: "expensive array copies that happen unintentionally with the 'auto' keyword"

      The issue is that if you write out the entire type you start to think "hang on, this declaration has four nested containers, how big is it, shouldn't I be taking care here?".

      If you just fling "auto" everywhere you don't think at all about whether you've just copied an int, a float - or a complete DOM of a gigabyte XML file.

      "auto" is neat for making some templates work, but just flinging it around everywhere because you can't be fagged to find out what data structures you are playing with - worse "it just saved me typing" - is a recipe for wasted resources and potential costly disaster.

  5. Doctor Syntax Silver badge

    "eBPF allows the safe injection of custom code into the kernel"

    Just wait till Poettering discovers it.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like