User topics

Article topics

Log in Sign up

Moore's Law is deader than corduroy bell bottoms. But with a bit of smart coding it's not the end of the road

In his 1959 address to the American Physical Society at Caltech, physicist Richard Feynman gave a lecture titled, "There's Plenty of Room at the Bottom," laying out the opportunity ahead to manipulate matter at the atomic scale. The semiconductor industry since then has made much of miniaturization, increasing the density of …

COMMENTS

Post your comment

House rules Send corrections

Add to 'My topics'

Friday 5th June 2020 21:35 GMT ovation1357

Could this finally mean an end to the programmers who get away with terrible code by arguing that they can simply throw more RAM and a faster CPU at it?

I'm yet to be convinced by the strategy of old only optimising code which is deemed to be too slow. I'm not saying that everything has to be carefully written in assembler but I've seen so many code abominations which technically work but are hugely wasteful of resources and become extremely difficult to maintain and support.

87 0 Reply
1. Friday 5th June 2020 23:47 GMT Anonymous Coward
  
  Hammer ISO Nail
  
  "Could this finally mean an end to the programmers who get away with terrible code by arguing that they can simply throw more RAM and a faster CPU at it?"
  
  Of course not. Terrible code will be with us always.
  
  As to the study, I'm not surprised that programmers think the solution is programming.
  
  Chip designers, OTOH, think the solution is chip design. The current push is for multi layered chips with interlayer connections between the transistors.
  
  Then there are the engineering boffins who think spintronics will solve the problem, given enough funding. https://www.sciencedaily.com/releases/2020/06/200603122949.htm
  
  And, of course, the bean counters vote for cheap programmers, cheap chips, and no research that won't affect the current quarter.
  
  Same as it ever was.
  
  74 0 Reply
2. Saturday 6th June 2020 12:56 GMT LucreLout
  
  Could this finally mean an end to the programmers who get away with terrible code by arguing that they can simply throw more RAM and a faster CPU at it?
  
  As a programmer I do hope so.
  
  While I'm here though, here's a pint for all the generations of CPU designers and hardware engineers who have brought us so far as they have. Top quality work folks!
  
  40 0 Reply
3. Saturday 6th June 2020 14:57 GMT The Man Who Fell To Earth
  
  So what's the point other that Python & Java suck at calculations?
  
  I downloaded the paper and I didn't see any info on the precision of the calculation.
  
  So I wrote a little program in the oldest least maintained compiler I have access to, (PB which hasn't had a compiler update in a decade or so & only generates 32-bit executables). My version dimensions the three matrices 4096-by-4096 matrices, initializes the A & B with data of the appropriate precision (so they are not full of just a bunch of zeros), then times how long it takes to matrix multiply A & B and assign the result to C using FOR loops as they do in the paper (as opposed to PB's built in matrix operators). I wrote it as a single thread 32-bit application. I redid it with the matrices & math at single, double & extended precision. Ran it on a 7 year old laptop that sports an i7-3632QM @2.20GHz. Windows 10 Pro, 64-bit 1909 (18363 build).
  
  Matrices declared as Single Precision: 732.2969 seconds ( = 12.2 minutes)
  
  Matrices declared as Double Precision: 871.9297 seconds ( = 14.5 minutes )
  
  Matrices declared as Extended Precision: 1076.062 seconds ( = 17.9 minutes)
  
  The PB compiler does all floating point calcs in extended precision (10 bytes), so one would not expect huge speed differences between calculations where the matrices are declared as single (4 bytes), double (8 bytes) or extended (10 bytes) matrices, as the work difference is mostly all the type conversion overhead.
  
  But it once again underscores that Python & Java suck at numerical calculations.
  
  19 3 Reply
  1. Saturday 6th June 2020 16:48 GMT Tom 7
    
    Re: So what's the point other that Python & Java suck at calculations?
    
    I ran some tests too - but I used a math library that was written in C++ to do the matrix calcs and it ran only a fraction of second slower than an optimised C++ version, but about 2000 times faster to write! That's some of what the author was trying to get across I think!
    
    24 0 Reply
    1. Saturday 6th June 2020 21:46 GMT DavCrav
      
      Re: So what's the point other that Python & Java suck at calculations?
      
      A very clever guy called Richard parker has spent a significant amount of time trying to optimize a matrix muliplication algorithm, but over finite fields rather than using doubles. (Think {0,1} if you don't know what a finite field is.)
      
      He has managed to improve the current best algorithm, as implemented, by orders of magnitude. This is used for multiplying matrices in the hundreds of thousands of dimensions, rather than a few thousand.
      
      The first thing is you don't need to do as many rounds of Strassen as you might expect to do, only half a dozen at most. And then the problem becomes chopping your matrices up into the correct sized blocks so you work just inside each level of cache. You really have to worry about the distance between the processor and the memory when you are doing things like this, and trying to optimize it. But you also need code that works on lots of different computers. Thus he has to make conservative estimates on the amount of L1/2/3 cache, etc. so that it always works.
      
      He has to work in assembler, because he can't trust even a C compiler not to stuff up his code.
      
      It's fascinating stuff, but a little too close to the coalface for my taste.
      
      22 0 Reply
    2. Monday 8th June 2020 10:35 GMT Cuddles
      
      Re: So what's the point other that Python & Java suck at calculations?
      
      "That's some of what the author was trying to get across I think!"
      
      That's the impression I got. Anyone familiar with Python should be aware that while it's an easy one to learn, there's a reason pretty much all the heavy lifting is done by calling C libraries. But it's interesting to see a more quantative look at exactly how much speedup you can get from specific changes.
      
      As for HildyJ's point about programmers saying programming is important and chip designers saying chip design is important, the obvious answer is that neither is particularly useful in isolation. This study gives a nice example of that. Parallelising the code to use all the cores gives a nice speedup. If the chip doesn't have multiple cores that obivously isn't going to help, while if the programmer doesn't use them it doesn't matter how many extra chips you add. Hardware and software have to develop together, otherwise you'll just end up with hardware providing options that aren't used by the code, and code trying to do things that aren't physically possible on the hardware.
      
      5 0 Reply
  2. Sunday 7th June 2020 20:14 GMT John H Woods
    
    Re: So what's the point other that Python & Java suck at calculations?
    
    Smalltalk is always my go-to --- 'notoriously' slow but very powerful and easy to knock something up.
    
    1966 seconds for a single threaded effort on a aging i5 whilst I was doing something else. Come on, is Python really that bad? I was going to try to learn it!
    
    Maybe before we try to improve our algorithms (and there's some pretty clever stuff you can do with matrices) we should ensure our languages aren't taking the mickey.
    
    5 1 Reply
    1. Monday 8th June 2020 21:14 GMT Someone Else
      
      Re: So what's the point other that Python & Java suck at calculations?
      
      Come on, is Python really that bad?
      
      No. But if you want balls-to-the-wall performance, look elsewhere. You know, matching the tool to the job....
      
      2 0 Reply
  3. Monday 8th June 2020 11:50 GMT TimGJ
    
    Re: So what's the point other that Python & Java suck at calculations?
    
    Ordinary Python does suck at number crunching. That's why we use numpy, so you get the efficiency of C++ with the convenience of Python.
    
    The completely meaningless and cherry-picked example of the 4096x4096 matrix multiplication takes 37s to execute on an single core of an AMD 3900X.
    
    4 0 Reply
  4. Monday 8th June 2020 21:11 GMT Someone Else
    
    Re: So what's the point other that Python & Java suck at calculations?
    
    Wonder how this would go using APL (a unreadable, unwritable language optimized for matrix manipulations)?
    
    It'd probably go like a bat out of hell...once you got it working. But writing and debugging the damn thing would cost more than any runtime gains you might realize.
    
    0 0 Reply
4. Saturday 6th June 2020 15:07 GMT Anonymous Coward
  
  the problem is (always) manglement
  
  In my experience it is the product managers who typically shy away from putting aside the necessary time for performance tuning. Security is also low on the list of priorities.
  
  Heck... (anon mode enabled) Last week we received an e-mail from our department mangler. He told us that a different product from within our company had been exposed, and now every product would be assigned somebody to be responsible for keeping third-party software up-to-date. This task would not require any planning, because it would take only five minutes (tops!) per week.
  
  Never mind that our team has a security shitlist containing about a year's worth of work if we were to start shoring up security properly (we still have some ancient sql-injection vulnerabilities. We hope that the remaining ones are impossible to exploit, but the low-hanging fruits have all been picked and those that are left are difficult/time-consuming to wrestle with). And never mind that it takes considerable time to convince internal IT to run our products in a secure manner. I spent a day alone convincing some bastards to shut down port 80 on an externally faced web application. "Why bother? The customers have been told to use https!"
  
  Jizusfreakingholymarycow.
  
  Same team mangler argued that we should use http internally between our microservices. "https carries a 20% overhead!". That guy incidentally has the confidence of Clint Eastwood, but we have noticed he says a lot of crap that just does not add up. We believe HR never checked his CV or references. The guy is a danger to himself and others - a potential Darwin award candidate.
  
  I care less and less. May these bastards rot in hell. In my experience, once a company reaches about 30 employees, the rot starts setting in.
  
  I'll keep my head down and write as good code as I can manage, but I feel my days are numbered.
  
  32 0 Reply
  1. Sunday 7th June 2020 13:57 GMT Anonymous Coward
    
    Re: the problem is (always) manglement
    
    Obligatory Dilbert.
    
    11 0 Reply
    1. Monday 8th June 2020 18:57 GMT Claptrap314
      
      Re: the problem is (always) manglement
      
      I figured you were going for "If we eliminate the design & testing phases, we can hit the release window", but this is good, too.
      
      1 0 Reply
5. Monday 8th June 2020 08:49 GMT big_D
  
  Run it in Fortran on a VAX, it will take less than a second... The optimizing compiler back then compiled a similar demo down to a single NOP wrapped in an .exe bundle.
  
  The same program running on a much more powerful mainframe took several days.
  
  The DEC compiler worked out that with 1) no input 2) fill a matrix with values 3) no output, it could optimize out part 2, because it wasn't needed, which left optimizing parts 1 and 3, which optimized down to NOP (no ouput), or an empty executable.
  
  7 0 Reply
6. Monday 8th June 2020 12:32 GMT Kubla Cant
  
  I'm yet to be convinced by the strategy of old only optimising code which is deemed to be too slow
  
  The trouble is that the programmers who currently write bad, inefficient code will produce something even worse if they're encouraged to "optimise" it.
  
  5 0 Reply
7. Monday 8th June 2020 21:07 GMT Someone Else
  
  Could this finally mean an end to the programmers who get away with terrible code by arguing that they can simply throw more RAM and a faster CPU at it?
  
  No.
  
  Next question?
  
  2 0 Reply
Friday 5th June 2020 21:49 GMT rcxb

the seven hour number crunching task can be reduced to 0.41s, or 60,000x faster than the original Python code.

Yes, but how many hours of programmer time did it take to do the optimization, and how much money does a few hours of a programmer's time cost versus a few hours of a single CPU core?

I'm all for simple and efficient programming, but processors are rather ridiculously fast now. The only thing I concern myself with performance wise is why my web browser takes so damn long to load what should be a simple Amazon product page, with it's delayed loading of tons of javascript that happens AFTER I've started typing into the search box messing things up, and not knowing when it's actually finished.

Longer-term, I'm sure chipmakers aren't going to just give up. Quantum computers are in the works, much has been said of optical, shortening pipeline or increasing L1 cache helps, and there's potential for exotic layouts like 3D multi-layer chips to give a speed boost as well. With many billions to be made, there won't be a shortage of R&D when we actually hit the wall.

9 37 Reply
1. Friday 5th June 2020 22:41 GMT RM Myers
  
  This seems like a case of "horses for courses". If you are writing the typical web applications, then the current optimizations built into the programming languages are probably more than enough. If you are writing operating system kernels, then you may well want to aggressively tune your algorithms. And if you are writing large scientific applications running full bore on supercomputers costing tens to hundreds of millions, then you probably want the type of optimizations mentioned in this article.
  
  33 1 Reply
  1. Saturday 6th June 2020 11:01 GMT Carlie J. Coats, Jr.
    
    If only...
    
    Except that many of these supercomputing applications are written to suit the hardware-behavior and compiler-limitations of 1980's vintage vector machines. Unnecessarily.
    
    Which is why my version of the WRF weather-model is so much faster than NCAR's...
    
    7 0 Reply
  2. Monday 8th June 2020 09:36 GMT big_D
    
    I've worked on optimizing a few web projects where the "built-in" optimizations weren't enough, because the code had been written to be elegant and human readable, with no thought about how "executable" the code was.
    
    1 2 Reply
2. Friday 5th June 2020 22:43 GMT vtcodger
  
  Things grow ... until they don't
  
  Moore's law is just exponential growth. It's probably best stated as "a lot of things tend to grow at a constant rate ... until they don't." If you want an equation, try X = R^T where R is a growth rate and T is a time. For Moore's Law -- the growth in the number of "transistors" in a given area of an IC, R is about 1.414, so that for time (T) = two years, X=1.414**2.0 = 2.0. i.e. doubling in two years.
  
  What about "... until they don't". Well, things genuinely don't grow exponentially forever. But feature density has managed a pretty good run -- 60 years. Will it continue? For How long? Who knows?
  
  19 0 Reply
  1. Saturday 6th June 2020 17:42 GMT Milo Tsukroff
    
    Re: Things grow ... until they don't
    
    > feature density has managed a pretty good run -- 60 years.
    
    > Will it continue? For How long? Who knows?
    
    Now that chip making has moved to China, indeed, who knows? How can anyone know? 'Nuff said.
    
    3 10 Reply
  2. Monday 8th June 2020 11:21 GMT Captain Hogwash
    
    Re: Things grow ... until they don't
    
    Did you know that disco record sales were up 400% for the year ending 1976, if these trends continue...AY!
    
    4 0 Reply
3. Saturday 6th June 2020 01:42 GMT Doctor Syntax
  
  "Yes, but how many hours of programmer time did it take to do the optimization, and how much money does a few hours of a programmer's time cost versus a few hours of a single CPU core?"
  
  And how much is the user's time worth whilst they wait for a task to complete?
  
  The programmer only has to optimise it once. Many users may use the program many times.
  
  44 0 Reply
  1. Saturday 6th June 2020 11:09 GMT Carlie J. Coats, Jr.
    
    Performance limits
    
    And because of parallel overhead (for example), that inefficiency sets limits on best turnaround time. When you plot performance vs number of processors, you normally get a U-shaped curve: at first, adding processors cuts your turnaround time, but eventually the parallel overhead kicks in enough that adding more processors adds to your turnaround time. 64 processors may well be slower than 32 in that instance.
    
    The way to fix it is better algorithms and better coding, to push the whole curve downward toward the X-axis.
    
    Twenty years ago or so, I reviewed a paper on someone's parallelization of an atmospheric-chemistry model. Their best performance was happening at the 16-processor level. But. A different and equivalent model I know achieved better performance than that one on just 2 processors -- and scaled better, so that its performance-curve was best at 32 processors. If your model is too inefficient to begin with, it doesn't matter how many processors you throw at it, you are limited by that initial inefficiency.
    
    23 0 Reply
    1. Saturday 6th June 2020 12:41 GMT Rol
      
      Re: Performance limits
      
      Anyone with a touch of Asperger's, would vomit uncontrollably at the sight of the x86 instruction set.
      
      When it comes to inefficiencies, the instruction set that has been used as much as a weapon by the dominant player, Intel, against it's competitors, is one of the most inglorious examples of how not to develop technology.
      
      If we were developing a cpu today, with no regard for the legacies of the past, then without any added leaps of science we could turn the x86 into a beast that would more than double performance of any application at a stroke. Of course the application would need to be compiled using a new compiler, that mapped to the more efficient use of the bytes set aside for instructions, and obviously allowing for many more instructions to be added, but this time, by a consensus of the players, not individual companies looking to steal an edge on their competitors.
      
      Seeing as the industry is transfixed with the idea of obsoleting everything they can in the search for greater revenues. it beggars belief that the x86 instruction set has not been completely overhauled and set on a more steady course, with a governing body overseeing additions to the universal set.
      
      9 8 Reply
      1. Saturday 6th June 2020 13:26 GMT Anonymous Coward
        
        Re: Performance limits
        
        Intel tried to replace it, moving stuff into software to improve performance, that's why we now all use AMD x86 64bit CPU instruction set.
        
        19 0 Reply
      2. Saturday 6th June 2020 17:56 GMT Mike 16
        
        Re: Performance limits
        
        Replacing x86 (and x86_64 aka amd64)? Go for it!
        
        https://www.ebay.com/b/Intel-Itanium-Computer-Processors-CPUs/164/bn_5757133
        
        Maybe compromise with a nice i860 box?
        
        Yeah, barely touched a 4U Itanium server back in the day. Wouldn't mind picking up an Itanium VMS pizza-box, but really only for much the same reason as keeping my KIM-1 running)
        
        4 0 Reply
        
        Saturday 6th June 2020 20:21 GMT bombastic bob
        
        Re: Performance limits
        
        yeah even with ARM upwardly gaining ground in the world of computing prowess, x86 and amd64 instruction sets are on the vast majority of high performance personal computing devices.
        
        There are reasons for this. Even though 'back in the day' everyone thought RISC would solve performance issues by acting more like microcode in the actual applications, they apparently forgot that instruction fetching takes time, too, and when it takes 2 or 3 or 4 instructions in a RISC architecture to do what 1 instruction does in CISC architecture, the lines of performance benefit get blurry. And of course, Intel and AMD made their pipelines and caching more efficient.
        
        The article got it right early on: the tech industry needs software performance engineering
        
        Right, Micros~1?
        
        People will buy a new computer when it's perceived to be faster (and better) than what they already have; otherwise, you only need to maintain the old one. And statistics on desktop usage focus on new sales, NOT on existing users that simply fix their existing boxen. This skewed the numbers, causing many bad decisions to be made, not the least of which is the assumption that performance DID NOT matter as much as "feature creep" "new, shiny" features. And here we are today!
        
        .
        
        4 2 Reply
        
        Sunday 7th June 2020 22:29 GMT Nick Ryan
        
        Re: Performance limits
        
        It's made more complicated because in order to handle to ridiculously complicated and often edge-case instructions cobbled into the extended x86 instruction set, both AMD and Intel implemented much of it using a simplified RISC instruction set underneath. Much easier to validate this and to performance tweak it.
        
        5 0 Reply
        
        Saturday 13th June 2020 17:13 GMT Anonymous Coward
        
        Re: Performance limits
        
        But also easier to fetch, as increasingly it's the stuff outside the CPU that's the bottleneck for performance as memory lost the ability to keep up with the CPU around the time of 80486DX2, the first of the clock-doublers. What the other poster was saying is that RISC loses its simplicity advantage when it gets choked by the memory. At this point, it's a general wash when it comes to computational efficiency because that's not the main chokepoint anymore. What you really need, especially in HPC, is a lot of memory bandwidth, and Intel/AMD currently have the ARM chips beat (because, like it or not, memory handling needs power). For ARM to compete, they need chipsets specifically built for high memory bandwidth, and that's a relatively new field in ARM.
        
        0 0 Reply
      3. Monday 8th June 2020 12:38 GMT John Smith 19
        
        "it beggars belief that the x86 instruction set has not been completely overhauled"
        
        But it's really all microsoft know how to code for of course.
        
        They don't call them "Code museums" for nothing.
        
        And BTW the sort of orthogonal instruction set you're talking about was developed by both Motorola on the M68000 and IBM in the POWER PC ISA's.
        
        3 3 Reply
        
        Monday 8th June 2020 16:46 GMT Ozzard
        
        Re: "it beggars belief that the x86 instruction set has not been completely overhauled"
        
        Take a look at the PDP-11 instruction set vs. the M68000. You might be surprised by the similarities - nothing new under the sun.
        
        (Nope, still missing the "old fogey" icon)
        
        3 0 Reply
        
        Tuesday 9th June 2020 20:21 GMT Nick Ryan
        
        Re: "it beggars belief that the x86 instruction set has not been completely overhauled"
        
        I remember the pleasure of not having to spend more CPU cycles juggling a pitiful number of limited registers around rather than do anything constructive with the CPU cycles. To "fix" this, they just added more and more complicated instructions that after glaring at them for a while, holding the documentation sideways and waving a dead chicken at I'm often at a loss as to why some of them exist other than the very occasional edge case.
        
        1 0 Reply
        
        Sunday 21st June 2020 12:21 GMT Anonymous Coward
        
        Re: "it beggars belief that the x86 instruction set has not been completely overhauled"
        
        I believe Parkinson's Law applies here. No matter how many registers you have in a processor, the jobs you have will expand to take them all up until you have to juggle them all over again.
        
        0 0 Reply
        
        Sunday 12th July 2020 21:44 GMT Anonymous Coward
        
        "the PDP-11 instruction set vs. the M68000."
        
        "the PDP-11 instruction set vs. the M68000." "nothing new under the sun"
        
        ?
        
        PDP11: 16bit instruction and operand and address size, any general register (R0-R5) could be used for any purpose with any addressing mode (lots to choose from, including indexed and pre-decrement and post-increment). R6 was conventionally the stack pointer, R7 was the PC. Both R6 and R7 could be used with the same addressing modes as other general registers. What might MOV -(PC), -(PC) do?
        
        Almost everything you need to know about PDP11 instruction set architecture could be found on the PDP11 Programming Card, still available on the Interweb at e.g.
        
        https://www.montagar.com/~patj/dec/pocket/pdp11_programmingcard_1975.pdf
        
        68K: various sizes of operand (mostly 32bit), lots of 32bit registers (but some separate registers for addresses and data), some (but not total) flexibility in register usage and addressing modes.
        
        I could go on, but what's the point.
        
        What was your point again?
        
        0 0 Reply
4. Saturday 6th June 2020 02:22 GMT Kevin McMurtrie
  
  Plenty of companies are paying millions of dollars a year in compute costs. The ones that will still be around tomorrow don't want to hear any crap about scaling up to accommodate lazy code.
  
  It doesn't matter how fast computers are. You're in trouble if your competitor can make them run even 50% faster.
  
  29 2 Reply
  1. Saturday 6th June 2020 09:25 GMT Anonymous Coward
    
    Damage limitation
    
    Actually if your software is wrong or unreliable, the slower the computers run it the better.
    
    4 6 Reply
  2. Saturday 6th June 2020 13:00 GMT LucreLout
    
    The ones that will still be around tomorrow don't want to hear any crap about scaling up to accommodate lazy code.
    
    On the one hand I agree with you, on the other hand "Javascript programmer".
    
    7 3 Reply
5. Saturday 6th June 2020 12:53 GMT LucreLout
  
  Could this finally mean an end to the programmers who get away with terrible code by arguing that they can simply throw more RAM and a faster CPU at it?
  
  For the 10x increase you're basically talking about it being freely available. Most programmers who know Python will also know or can learn quickly sufficient .NET or Java to see gains.
  
  To be honest, multithreading the code provided should be within the capacity of any proper programmer in any proper language to do quickly and accurately which will see you hitting way north of the 47x improvement because you'll be able to leverage n cores on the CPU, meaning you should be able to get closer to 47 * n speed increase, depending on what other compute is happening on the box. (ETA: You will of course never actually get 47 * n) Thereafter doing a distributed calculation across horizontally scaled boxes should allow for significant real world gains if you still need more power.... or just learn assembler.
  
  Programmers that require hardware to bail them out of performance problems could do the industry a massive favour and move to another profession.
  
  6 0 Reply
  1. Sunday 7th June 2020 12:07 GMT Tomato42
    
    there's also the problem with subscripting arrays in Python being notoriously slow, I wouldn't be surprised if just rewriting the code to use iterators and array comprehension wouldn't speed it up by a factor of 10
    
    1 0 Reply
    1. Monday 8th June 2020 09:56 GMT LucreLout
      
      Quite possibly it will. I guess there comes a point in any software performance curve where you need to be realistic about a languages relative horsepower. Sometimes Python won't get you there, so you need Java or .NET. Sometimes they won't get you there so you need C. Sometimes that won't get you there so you go for assembler.
      
      The most efficient trick, is understanding that before you start coding, so you can make a realistic determination of language / framework trade offs relating to speed of execution vs speed of development vs cost of maintenance. Lets face it, if I had 25 years assembler or C experience, I'd be wanting to earn more than I do as a .NET/Java/Python dev, which should be a factor.
      
      3 0 Reply
  2. Monday 8th June 2020 17:40 GMT jelabarre59
    
    Programmers that require hardware to bail them out of performance problems could do the industry a massive favour and move to another profession.
    
    Yes, but they'll just move into management.
    
    1 0 Reply
6. Saturday 6th June 2020 22:41 GMT John Brown (no body)
  
  "Yes, but how many hours of programmer time did it take to do the optimization, and how much money does a few hours of a programmer's time cost versus a few hours of a single CPU core?"
  
  You seem to be comparing "expensive" optimisation against a single ruin of a programme. How do you think the numbers stack up when the programme is run every day for a year? Maybe it runs multiple instances for different users? Or it's a product you sell and it's run millions of times per day all over the world? So maybe it cost a few grand to optimise, but it saves millions of hours of CPU time around the world making your product that much more competitive than others so you sell more.
  
  3 1 Reply
  1. Monday 8th June 2020 00:07 GMT gvp
    
    Rule of a million
    
    Back in the day when I was doing this stuff , I applied a rule of thumb that I called the rule of a million
    
    Does it run a million times a year, or does it use more than a million CPU seconds when our largest client runs it? Optimise.
    
    The rule of a million stood next to the rule of three: if I can't think of three qualitatively different ways to solve a problem, I don't understand the problem. More investigation required.
    
    (The latter rule almost always led to me using the fifth or sixth approach that I thought of. Think of two ways, others are hard to see. Three, it gets easier.)
    
    5 0 Reply
7. Monday 8th June 2020 09:35 GMT big_D
  
  It depends on what you are doing. If it is a one-off quick and dirty calculation, then the optimization probably doesn't matter.
  
  If it is a system used by hundreds or thousands (or web based, possibly millions of users), then the time invested by the programmer is very cheap, if he can bring down the processing time.
  
  Two examples:
  
  1) a set of financial reporting tools, run every month on a users computers, blocks the computer from all other use during that time. Runtime before optimization: 22 hours, times 250 financial users around the world every month. Optimization: 1 programmer for 2 weeks. Runtime after the optimization: < 3 hours. A saving of 4,750 processing hours per month and around 2,000 man hours of recovered activity involving using their computers. That was 80 hours well invested.
  
  2) an online shop with 4 load-balanced front-end servers and a big back-end MySQL server. When the PayPal newsletter came out and the shop was listed, the whole thing would keel over and die, when around 250 users were spread over the 4 front end servers - the query to generate the front page menu would go from under 1 second to over 2 minutes and the DBA would put in overtime restarting the MySQL database every few minutes.
  
  4 hours of looking at the code, optimizing some decision trees and re-ordering the "WHERE" clauses of the SQL statements, under the load of over 250 user PER SERVER, the menu query was down to under 500 milliseconds and the loading of the front page was under 4 seconds.
  
  They could have thrown a bigger database server and more load-balanced front end servers at the problem, but that wouldn't have been economical, especially when a programmer who understood MySQL and processor architecture was let loose on the code and could get that sort of performance improvement for less than the price of a new SAS drive...
  
  Those are both real-world examples I was involved in. I was brought in to fire-fight both projects, the first an MS-DOS based system, written in BASIC by FORTRAN mainframe programmers and maintained by COBOL mainframe programmers. Having someone who actually understood PC architecture and where the weaknesses were (video output was the biggest bottleneck) made that huge difference.
  
  Likewise, the second one was a couple of year back. The code was elegant and easy for a human to read, but the devs had little of no knowledge of processor architecture (and how to optimize PHP to work more efficiently) and little to no knowledge of optimizing MySQL. Quickly re-ordering the queries and some ifs and loops was all that was required, it was still elegant and easy for a human to read, but more importantly, it was also efficient for a computer to read and execute.
  
  3 0 Reply
Friday 5th June 2020 22:05 GMT J. R. Hartley

Quantum

Where does quantum computing fit into all of this?

0 1 Reply
1. Saturday 6th June 2020 00:34 GMT ThatOne
  
  Re: Quantum
  
  > Where does quantum computing fit into all of this?
  
  Principle of uncertainty applies...
  
  28 0 Reply
2. Saturday 6th June 2020 01:42 GMT Ian Johnston
  
  Re: Quantum
  
  It relies on fusion power and Linux on the Desktop. As soon as those have been cracked, quantum computing will be along. Well, in ten years.
  
  45 1 Reply
  1. Saturday 6th June 2020 02:06 GMT jake
    
    Re: Quantum
    
    They'll all be delivered in a flying car, driven by a household robot which can also dig the spuds, do the dishes, fetch me the mail and a beer, and change the sprog's nappies ... all in one unit.
    
    20 0 Reply
  2. Saturday 6th June 2020 16:11 GMT Doctor Syntax
    
    Re: Quantum
    
    Linux has been on my desktops and laptops for years. A glance out of the window shows that fusion power is ticking along nicely, as it has been all these billions of years. So what's holding up quantum computing?
    
    16 0 Reply
    1. Sunday 7th June 2020 12:09 GMT Tomato42
      
      Re: Quantum
      
      that uneven spread of The Future
      
      5 0 Reply
3. Saturday 6th June 2020 03:44 GMT redpawn
  
  Re: Quantum
  
  Fusion powered Quantum processors cooled with black hole technology allow quantum computing obey Moors Law for the foreseeable future. Smart coding can wait.
  
  6 0 Reply
  1. Saturday 6th June 2020 20:21 GMT bombastic bob
    
    Re: Quantum
    
    actually, you just need to make the inside bigger than the outside. Voila!
    
    5 0 Reply
  2. Saturday 6th June 2020 23:07 GMT John Brown (no body)
    
    Re: Quantum
    
    "Fusion powered"
    
    Really? That's so last century. All the cool kids are working on Zero Point Energy these days. It's the next (in only 50 years!!) big thing.
    
    4 0 Reply
4. Saturday 6th June 2020 06:45 GMT Anonymous Coward
  
  Re: Quantum
  
  It will or it won't or it will and won't at the same time though you can't be sure even when looking at it.
  
  13 0 Reply
5. Saturday 6th June 2020 09:34 GMT Annihilator
  
  Re: Quantum
  
  In one of the universes, it's already here.
  
  9 0 Reply
Saturday 6th June 2020 01:07 GMT Anonymous Coward

DEC Fortran

There was a time that the DEC Fortran compiler was the bleeding edge of compiler design and every version was eagerly awaited, so we could eeek out more performance from code. I was involved in important finance applications which depended on the quality of the code and the performance on whatever hardware DEC could provide. In short, we crafted and cared about the code quality, the efficiency of compiler optimization & linkers and algorithm performance in user code and the libraries. We even had image profilers to optimize application images for specific hardware characteristics.

The world reverted to BASIC and interpreters and a multi-billion dollar company was created with bloated horrible code. C was mainstream for no good reason and then C++ and the whole "object" thing got out of hand and code got worse and more bloated. Hardware got faster and cheaper and no one cared, least of all MS who were taking over the world and re-aligning the industry's concept of quality and performance massively downward. Java VMs came for another level of abstraction from the HW and for more sluggishness and massive unreliability. Then we all went WWW and more interpreted and useless code was foist upon us. JIT got in there to help but it hardly made a jot of difference.

It's all so blurry now ... why did it go so wrong?

The answer is "Russian Programmers", or at least that mindset.

/end of rant

17 4 Reply
1. Saturday 6th June 2020 02:15 GMT jake
  
  Re: DEC Fortran
  
  I know folks who say that that DEC Fortran compiler is still the bleeding edge of compiler design. At least a couple of them are in Redmond working on the Windows kernel.
  
  Russian programmers aren't the answer. They are still working with stolen ideas, and have come up with few of their own.
  
  3 5 Reply
  1. Sunday 7th June 2020 14:37 GMT Anonymous Coward
    
    Re: DEC Fortran
    
    I am sad to report that I haven't seen a VMS/Rdb machine since 2007 as I was forced over to the dark side of WindowsServer/SQLServer. Money and opportunity were the principle reasons. It was horrible, 8 years of horror, but mainly because of the MS fanboys of limited skill and even more limited knowledge with whom I had to share oxygen.
    
    5 0 Reply
  2. Sunday 7th June 2020 14:39 GMT Anonymous Coward
    
    Re: DEC Fortran
    
    The "Russian Programmer" reference was from the Michael Lewis book where one of the principle characters is a Russian programmer who himself is infamous.
    
    1 0 Reply
2. Saturday 6th June 2020 03:28 GMT YetAnotherJoeBlow
  
  Re: DEC Fortran
  
  So, back full circle. It's nice to know that the way I was taught in the 70's and 80's is back in vogue again. I never lost sight of that and continue to this day writing fail-safe code.
  
  I have fond memories of DEC FORTRAN. Both F IV and F77. I used to burn EPROMS under RT11 and RSX.
  
  12 0 Reply
3. Saturday 6th June 2020 09:39 GMT Alan J. Wylie
  
  Re: DEC Fortran
  
  Back in the early 80's, I used DEC FORTRAN on a VAX-11/780 developing an early Geographic Information System (as it is called these days). I remember one program (interpolating spot heights on a grid from contour lines, perhaps) which did a lot of looping over arrays. There was a DEC supplied program that drew a text based representation on a VT100 of pages being swapped (paged?) in and out of memory (we originally had a huge 512kB, later expanded to 3/4 of a MB). You could see when you had your array indices the wrong way round, pages were rapidly swapped in and out all over the place, rather than a neat little chunk with pages being added at the end and lost at the beginning.
  
  That computer (about 1MIPS) and memory were enough to run an interactive line following digitising program as well as several developers simultaneously editing and compiling.
  
  7 0 Reply
  1. Saturday 6th June 2020 10:53 GMT djnapkin
    
    Re: DEC Fortran
    
    512KB? We used to dream of having that much memory around 1980. The Unisys mainframe was as big as several cupboards, and had 192KB of memory. Each memory card was 16KB but was the size of what would be a large motherboard today.
    
    Sure taught us how to make sure our programs were optimised.
    
    14 0 Reply
    1. Sunday 7th June 2020 14:47 GMT Anonymous Coward
      
      Re: DEC Fortran
      
      We lived in shoebox in middle of road!
      
      But seriously, I worked on a S/34 writing RPG II code where we only got 1 (sometimes 2) compiles per day. The S/34 we used was maxxed out (256K of RAM and 256MB) and Big Blue had no upgrade path. Sometime after that they invented the S/36 but that was a long time coming.
      
      I was glad to be out of the place, but it sure taught me to appreciate some good skills. There is just so much you can do with 99 indicators.
      
      2 0 Reply
      1. Monday 8th June 2020 09:50 GMT big_D
        
        Re: DEC Fortran
        
        Shudder. I remember S/36 and RPG II and III.
        
        0 0 Reply
4. Saturday 6th June 2020 23:38 GMT Graham Newton
  
  Re: DEC Fortran
  
  My final year project at Uni was a mathematical model of the human eye at low light intensity. Like the article it relied on loop within loops. The university computer was a DEC 10 and the program was written in Fortran.
  
  I spent a lot of time ensuring that the program would run to completion without intervention. Unlike my fellow student who would babysit their programs overnight.
  
  However my program consumed 12 hours of run time and was terminated. I got a "see me" email.
  
  I was worried, this was my final year project.
  
  They didn't bollock me but suggested that I sent my program to Manchester University . Not knowing about modems I thought I had to send my program by post but was put straight on this.
  
  After a compile failure a CDC (Control Data)machine executed my program in less than a second.
  
  This taught me that you had to program to the machine, not try to make it do things it wasn't built to do.
  
  So for example I have:
  
  Used the Transputers ability to do 2D memory manipulation and paralleled processing and CPU core linking to to do image processing for world class astronomical telescopes.
  
  Produced a minimal memory and CPU cycle timesliceing OS to run experiments on the Cassini Huygens lander.
  
  Programmed SHARC DSPs to operate on multiple audio streams concurrently using the SIMD mode.
  
  Normal programming makes me weep, it's sledgehammer all the way, no artistry, no finesse.
  
  I sit and wonder WTF when it takes several seconds for a word document to load.
  
  16 0 Reply
  1. Sunday 7th June 2020 14:49 GMT Anonymous Coward
    
    Re: DEC Fortran
    
    I feel unworthy ...
    
    3 0 Reply
5. Monday 8th June 2020 09:47 GMT big_D
  
  Re: DEC Fortran
  
  We were using it for seismic surveys of oil and other mineral fields (predominantly oil). You needed to eek every last millisecond out of the calculations, because they would tie up the computer room full of VAXes for hours at a time.
  
  I related above, but the optimization was so good, one mainframe sales-rep went away with his tail between his legs. They gave us a mainframe to play with and a test-suite to run in parallel on a spare VAX. The test-suite should run for a week on the mainframe and a few weeks on the VAX. We should call him in a week, when the mainframe was finished.
  
  When he got back to his office an hour later, there was a message for him to call us, the VAX was finished. The DEC FORTRAN compiler had looked at the code, worked out that 1) no input, 2) fill random array, 3) no output meant that 2 was superfluous and optimized that out of the executable, which was essentially empty and took less than a second to run...
  
  4 0 Reply
6. Monday 8th June 2020 10:48 GMT Greybearded old scrote
  
  Re: DEC Fortran
  
  I was going to protest the slur against a whole nation.
  
  Then I remembered the person who trashed performance on my favourite language with inefficiently implemented extended object libraries and an absolutely hideous ORM. He is, in fact, from Russia. A single datum I know, but it's the only one I have.
  
  Sadly both of those lardy things are generally seen as the One True Way these days. I'd fail at interview if I even mentioned any misgivings about them.
  
  1 0 Reply
7. Monday 8th June 2020 18:07 GMT Dan 55
  
  Re: DEC Fortran
  
  There's nothing that says that BASIC has to be interpreted, Dartmouth wasn't and the ones running on the 1970s mainframes weren't. CBASIC on CP/M wasn't either. The late 1970s-1980s computer versions were interpreted (and Microsoft did many of those so you know where the blame lies) but then later on they became compiled too as home computers and PCs became more powerful.
  
  2 0 Reply
  1. Tuesday 9th June 2020 11:50 GMT big_D
    
    Re: DEC Fortran
    
    Microsoft was selling a BASIC compiler throughout the 80s. Many computers came with BASIC interpreters built into ROM, including the original IBM PC (standard configuration was a cassette port and BASICA in ROM, no floppy drive - well, and no keyboard or display either, everything was extra).
    
    But Microsoft also sold a compiler for CP/M and MS-DOS. We had code that ran on HP-125 (CP/M), HP-150 (MS-DOS), HP Vectra (sort of IBM compatible, only not very, MS-DOS) and IBM PCs (PC-DOS). All were compiled using Microsoft's BASIC compiler and you had to replace the header file, which contain the definitions (strings with escape codes) for accessing the screen (moving the cursor, clearing the screen, inverse video, bold etc.) for each platform.
    
    0 0 Reply
Saturday 6th June 2020 01:17 GMT jake

They are eyeballing the problem from the wrong orthagonal perspective ...

"the tech industry needs software performance engineering, better algorithmic approaches to problem solving, and streamlined hardware interaction."

None of those is the "top of the compute stack". The real TOTCS is BKAC ... the nut behind the wheel. The human.

We need to start doing is to teach humans to actually use computers to get more out of them. And by use computers, I don't mean facebook, twiitter and power point etc.; nor do I mean iFads and Fandroids. I mean actually learning how to use computers.

It'll never happen, alas. Not as long as marketing is still runni ... oh! SHINY!

19 2 Reply
1. Saturday 6th June 2020 13:56 GMT Anonymous Coward
  
  Re: They are eyeballing the problem from the wrong orthagonal perspective ...
  
  So it can't be the real TOTCS because humans don't compute. Otherwise, we wouldn't have computers in the first place. So the article is still correct.
  
  2 3 Reply
  1. Saturday 6th June 2020 23:17 GMT John Brown (no body)
    
    Re: They are eyeballing the problem from the wrong orthagonal perspective ...
    
    Before digital electronic computers we had electric analogue computers. Before those, we had mechanical computers. Before those, "computer" was a job title.
    
    7 0 Reply
    1. Sunday 7th June 2020 06:03 GMT Anonymous Coward
      
      Re: They are eyeballing the problem from the wrong orthagonal perspective ...
      
      That was then. This is now. You underestimate the rate of dumbing of the collective human intelligence. Ask one of today's "experts" to figure out trigonometry with a slide rule.
      
      0 5 Reply
2. Monday 8th June 2020 10:39 GMT Greybearded old scrote
  
  Re: They are eyeballing the problem from the wrong orthagonal perspective ...
  
  I disagree, there's no excuse for requiring the mundanes to understand the innards, any more than they have to be able to strip and reassemble their car. (Yes some do, but it's optional.)
  
  Given that what is in your pocket would have qualified as a supercomputer not so long ago we ought to be able to have shiny that is also amazingly fast. We don't, for the sorts of reasons detailed in the article.
  
  We, as an industry, have failed everybody.
  
  0 0 Reply
Saturday 6th June 2020 01:42 GMT Duncan Macdonald

Look first at the problem

Often a huge speedup can be obtained by spending a few minutes thinking about the problem before starting the design.

Many years ago the company that I was working for needed to do a lot of computation on a few days values from an Oracle database that had multiple years of data. The code that the consultants came up with worked - but would have taken over 2 weeks to produce the results as the main table was joined to itself in the query in a way that negated the speedup of the indexes. A bit of thinking and a much smaller table was produced by selecting only the required days from the main table and running the query using that table instead. That reduced the time required from over 2 weeks to under half an hour.

Another system was monitoring temperatures in a power station - the original spec had all the temps being monitored every second which was too much for the low performance mini computer of the time (early 1980's). A bit of looking at the items being monitored showed that many did not need a high scan rate (if the concrete pressure vessel temps are changing significantly in under 30 secs then it is well past time to run like hell!!). Changing the spec so that only the required items were scanned at high speed made the job easy for the computer to handle.

If you have a job that is going to heavily load a computer system then it is often worth spending some time to try to understand the problem (not just the spec) and see if there is any obvious inefficiencies in the spec that can be easily mitigated before starting coding.

39 0 Reply
1. Saturday 6th June 2020 09:15 GMT Julz
  
  Heretic
  
  " spending a few minutes thinking about the problem before starting the design"
  
  How can you move fast and break stuff efficiently if you stop and think first! As for a design, what the hell...
  
  28 0 Reply
  1. Saturday 6th June 2020 20:20 GMT TwistedPsycho
    
    Re: Heretic
    
    If you don't work fast, you can't take on more underpaid jobs, and then who is going to keep the Vodka flowing!
    
    0 0 Reply
Saturday 6th June 2020 01:42 GMT ratfox

The authors stress the need for hardware makers to focus on less rather than Moore.

*clap clap clap*

21 0 Reply
Saturday 6th June 2020 01:42 GMT Ian Johnston

Anyone who multiplies two matrices like that is an idiot. Doing it properly was Exercise 1 in the IBM "How to use a supercomputer" course I took at the Rutherford Lab's place in Abingdon in 1989.

7 3 Reply
1. Saturday 6th June 2020 05:32 GMT KorndogDev
  
  This!
  
  Anyone who does not use an optimized library dedicated to numerical computations is an idiot. And of course even Python has one.
  
  19 1 Reply
  1. Saturday 6th June 2020 20:21 GMT bombastic bob
    
    Re: This!
    
    the example from the article in Python was an attempt at showing a gross performance method, vs optimized code. Use of "yet another Python lib" isn't helping, nor a solution, to this kind of gross inefficiency. You simply do NOT use an interpretive lingo like Python, with slow code repeated a zillion times within a big loop, like that example does. You write it properly, with efficient methods, in a lingo that's capable of rapidly and efficiently performing the necessary calculations. I believe THAT was the point.
    
    So, write it in C, using a hand-tweeked threaded algorithm, and inline assembly for the innermost parts. That'll do nicely!
    
    Based on this example, I'd say there are 2 kinds of programmers: Those who code low-level efficient code for things like kernel drivers and microcontrollers, and those who don't. The ones who don't often use inefficient lingos like Python and Javascript and THEN claim that "libraries" will somehow 'fix' the inefficiency. But they never do.
    
    (am I the only one who got that out of this portion of the article?)
    
    8 2 Reply
    1. Saturday 6th June 2020 21:59 GMT DavCrav
      
      Re: This!
      
      "So, write it in C, using a hand-tweeked threaded algorithm, and inline assembly for the innermost parts. That'll do nicely!"
      
      As I mentioned in a previous comment, C code, hand optimized, is about 100x or more too slow compared with a big brain (not mine) and assembler code that is optimized to hell, using each level of cache perfectly and only calling from RAM as and when needed.
      
      4 0 Reply
      1. Sunday 7th June 2020 12:18 GMT Paddy
        
        Re: This!
        
        > ... using each level of cache perfectly
        
        So thisis code for only one CPU cache setup? That'sa pretty precise hardware spec, not very useful to others in general.
        
        0 0 Reply
        
        Sunday 7th June 2020 12:54 GMT DavCrav
        
        Re: This!
        
        No, he uses the minimum amount that exists over all standard cores. What I meant by using each level perfectly is doing the right operation in the right level, and maximizing the number of operations done before moving the piece elsewhere.
        
        Almost all of the algorithm is memory management, trying to minimize I/O, because that's the real bound on the algorithm, it's not mathematical operation bound. He has to create his own scheduler in some sense, because there's no 'off the shelf' program that does what he needs. He needs to make sure that the piece of data hits the right core at the right time so it's used efficiently.
        
        He is, arguably, the leading authority on multiplying large matrices. His programs (Meataxe64) are the ones used by all people who want to manipulate big matrices.
        
        3 0 Reply
    2. Sunday 7th June 2020 06:35 GMT KorndogDev
      
      Re: This!
      
      No Honey, there is a reason why specialized libraries exist and are used by millions of people. One day you will discover it, I am sure of it.
      
      Meanwhile, you can still hope that your super-optimised hand-crafted code from last night is bug free and one day it will serve next generations.
      
      2 5 Reply
2. Saturday 6th June 2020 13:38 GMT Anonymous Coward
  
  Yeh, not a good choice, real world problems tend to be sparse matrix and matrix multiply is implemented as a parallel algorithm nowadays anyway. So why mention that old brute algo and then mention that better algos exist???
  
  I honestly don't get the point of the article.
  
  " argue that the tech industry needs software performance engineering, better algorithmic approaches to problem solving, and streamlined hardware interaction."
  
  Yeh, but that's exactly what it is now. A new algo is designed, the hardware gets new APUs to handle it. How is that not what happens now?
  
  7 2 Reply
Saturday 6th June 2020 05:30 GMT Gene Cash

They shoot themselves in the foot

So they go on and on about how conventional CPUs should be improved to handle vector calculation... then run their code in an instant on a GPU, which is DESIGNED for massive parallel vector calculation.

This is like saying "you don't need a floating point unit, the CPU should do that" or "you don't need a GPU, the CPU should do that"

I think they should get out of their ivory tower.

11 3 Reply
1. Saturday 6th June 2020 09:32 GMT Joe W
  
  Re: They shoot themselves in the foot
  
  That was the paper in a nutshell. A specialised processing unit beats a general purpose CPU hands down. Not only for this task, but also for others. Same with optimised libraries, moreso if it's a problem that is not embarrassingly parallel (multiple instances of the same code to speed it up).
  
  No, I don't find it surprising. It is a good reminder, but bleedin obvious. No wonder Science published it...
  
  7 0 Reply
2. Saturday 6th June 2020 11:12 GMT djnapkin
  
  Re: They shoot themselves in the foot
  
  I had a rather different take on the article to you. I thought it was well laid out and covered the progress of the optimisation, with great clarity. I'd say the results from optimising on a multi threaded CPU were impressive. The overall message of optiising your software was well carried.
  
  Threading is beyond many programmers, and running on a GPU is surely a specialised art - and I am not sure how many servers, either inhouse or cloud, would have GPUs. Perhaps they do. I just have not heard of that being a thing.
  
  4 0 Reply
  1. Saturday 6th June 2020 14:01 GMT Anonymous Coward
    
    Re: They shoot themselves in the foot
    
    GPUs as a specialized server calculation unit have been a thing for a number of years now. Thus nVidia's Data Center GPUs designed for HPC and Deep Learning functions and so on.
    
    5 0 Reply
3. Saturday 6th June 2020 23:35 GMT John Brown (no body)
  
  Re: They shoot themselves in the foot
  
  "This is like saying "you don't need a floating point unit, the CPU should do that" or "you don't need a GPU, the CPU should do that""
  
  It looked to me like they were demonstrating not just optimisation but strongly emphasising the types of optimisation. They then showed what happens when you reach the end of the path on a general purpose CPU and went on to show how specialised, dedicated processors could further optimise specific tasks in specific ways. So, as we reach 5nm and likely can't go smaller, there are fewer methods of optimising the hardware of general purpose CPUs so software people need to concentrate on optimising their code and the hardware people need to step up with more and better specialised processors.
  
  You only have to look at GPUs, why they were invented and what they are now used for. Likewise in audio and similar waveforms. DSPs do that very well, but you can do signal processing, albeit more slowly on a general purpose CPU. Or Cryptoprocessors designed for, well, you get the idea.
  
  4 0 Reply
  1. Sunday 7th June 2020 16:10 GMT Anonymous Coward
    
    Re: They shoot themselves in the foot
    
    But then there's the old tradeoff: ASICs are good at what they do, but what happens when the job you need shifts away from the ASIC's specialty? That's why the big push for general-purpose computing in the first place. Sure, you end up with the Jack-of-All-Trades problem, but at least you're likely to find something for it to do that would make any ASIC choke because the job at hand isn't within their realms of expertise. Put simply, there's a reason the world still has General Practitioners along with Specialists.
    
    4 1 Reply
Saturday 6th June 2020 06:35 GMT Anonymous Coward

C rocks.

It really does.

8 2 Reply
1. Saturday 6th June 2020 11:00 GMT djnapkin
  
  Re: C rocks.
  
  Yes it does, until the wrong subscript variable is used for an array index and you can't figure why you occasionally get memory corruption in a large program, causing disaster.
  
  Not that this ever happened to us.
  
  8 1 Reply
  1. Saturday 6th June 2020 14:02 GMT Anonymous Coward
    
    Re: C rocks.
    
    Well, it's the old tradeoff. If you want to go all out, you can't have safeguards get in your way.
    
    7 0 Reply
  2. Saturday 6th June 2020 14:23 GMT Rich 2
    
    Re: C rocks.
    
    ... whereas in some “higher level” (scripty) languages I can think of, the program doesn’t crash - you just get gibberish or because undefined variables magically come into existence!! That’s even harder to debug. At least when c crashes, you can get a core dump out of it which often points to the problem pretty quickly
    
    7 0 Reply
  3. Saturday 6th June 2020 17:00 GMT Tom 7
    
    Re: C rocks.
    
    You can do unit testing in C too you know!
    
    6 0 Reply
    1. Sunday 7th June 2020 16:11 GMT Anonymous Coward
      
      Re: C rocks.
      
      But then you're gonna have to test the unit test as well. Since C doesn't include its own safeguards, you can easily end up in a Turtles-All-The-Way-Down situation.
      
      0 0 Reply
2. Saturday 6th June 2020 14:17 GMT munrobagger
  
  Re: C rocks.
  
  C is just a well dressed assembler, but same rules apply.
  
  I worked on a C application that wrote large chunks of csv numbers. Turns out the implementation used strcat (yes that old) to write each number to an ever growing buffer string. But strcat uses the null terminator to find the end of a string, so each additional write requird strcat to start at the beginning of the buffer and scan all the way through it, before appending. Performance was horrendous when the chunks got large. Simple solution to maintain a pointer to the buffer string end, and modern libraries will do all for you, but a very good lesson on the need to know what is happening under the hood.
  
  15 1 Reply
  1. Monday 8th June 2020 11:52 GMT Brewster's Angle Grinder
    
    Re: C rocks.
    
    When I moved from asm to C and discovered null terminated strings were the norm I had a heart attack. It was a real step down. The problems you outline are just the beginning.
    
    2 0 Reply
Saturday 6th June 2020 09:57 GMT Anonymous Coward

Underpinning Ideology

I know there's a risk that this comment may just sound like the ranting is some old bawbag, but do hear me out.

About 25 years ago, I worked for a smallish financial institution. The even smaller investment arm was good. Very good. They frequently topped the charts with their performance, using an in-house dBase application (eventually Clipper) that they had honed over 8 or 9 years by that stage.

There were frustrations. Data had to be loaded each day from other systems, but the biggest issue was the network IO for each client workstation. This was seen as a Moore's Law issue at the time, so more hardware was thrown at it, but really it was that IO problem.

After a while, they bought in an Oracle based system running on dedicated minicomputers. The initial budget was £1million and they ran way beyond that by the time the new system was working.

But... Their investment performance didn't so much tail off as fall off a cliff. Whatever their previous modus operandi, the new system simply did not allow it, and within a short time they became also-rans in the investment game.

I often think of this experience, as around the time of their migration I wanted to explore the possibility of a centralised system fro running the clipper app, like a linux box, running dosemu, which would resolve their IO problems.

I'm not suggesting that that old Clipper program could have been stretch out indefinitely, but that following a perfectly valid and acceptable path tossed the baby out with the bathwater.

The main point stretched by this example, though, is that we in technology, I think, are so deeply inculcated with the assumptions of "upgrade", "better", "improved" and various other ideological concepts that are so closely allied to the technology industry that we never even question them. We think such assumptions of improvement are a natural part of our lives, even though we should be able to learn from the conclusions of, say, Enlightenment thinkers who similarly got mired in a philosophy of improvement. We do not stop to question ourselves frequently enough, or ever.

It may be argued that this ideology of improvement is what has driven the technology industry to achieve the heights it has. That may be the case, but we also need to think about limits to these assumptions, because of other unintended consequences, and also because reason dictates that "believing" in principles like Moore's Law is highly unlikely to be sustainable. We may, ourselves, become part of a problem, if we leave such underpinning assumptions unexamined.

19 0 Reply
1. Monday 8th June 2020 10:00 GMT MJI
  
  Re: Underpinning Ideology
  
  Clipper
  
  We used it up to XP, we had great database support especailly when running on Netware.
  
  Our data server was Advantage Xbase later Database Server.
  
  https://en.wikipedia.org/wiki/Advantage_Database_Server
  
  We were wiping the floor of the SQL based competitors on performance.
  
  2 0 Reply
Saturday 6th June 2020 10:33 GMT Rich 2

“... tech industry needs software performance engineering, better algorithmic approaches to problem solving, and streamlined hardware interaction”

I read that and my first thought was “shame the world seems to be migrating to stuff like python“

... only to have the issue highlighted a couple of paragraphs down. It’s been said zillions of times before of course, but hacky scripty language’s like Python that pay no regard whatsoever to how the underlying hardware works are the reason we all use multi-GHz multi-core power-eating monster computers - just to do a bit of word processing.

Hacky scripting languages have there place but things would work a lot better if the authors of the bigger (in size and/or time) python applications learned to use a more effective language that runs (for example) 47 times faster!! What a pile of shite modern software is :-(

14 2 Reply
1. Saturday 6th June 2020 14:19 GMT KorndogDev
  
  that puzzle
  
  There is a reason why a guy who is a C language expert decided to create Python (and write it in C).
  
  Now, YOU go and look for it.
  
  4 0 Reply
2. Saturday 6th June 2020 17:13 GMT Tom 7
  
  Modern software is not shite. There are just a huge number of inexperienced programmers out there. As I said in a different comment I used Python to run a matrix multiplication test that ran a fraction of a millisecond slower than the C++ version - because I had the experience to know it would be a problem and could use a library that would do it as best as possible, If I've had a GPU I could have used that and the Python would have been even faster - it might even have finished running while I told the C++ version to use that library too, I might even have done the whole thing in ROOT which would have been quicker than doing it in C++ and fixing compile time runs. Python is a good tool for playing and prototyping and exploring things - but most of the Python ecosystem is written in C++ for a reason - well my Pip install seem to run GCC more than I do!
  
  The problem we have today is computing is incredibly complicated and yet people come out of a 3 year university course knowing maybe 10% of what they need to know.
  
  12 2 Reply
3. Sunday 7th June 2020 10:16 GMT Anonymous Coward
  
  "It’s been said zillions of times before of course, but hacky scripty language’s like Python that pay no regard whatsoever to how the underlying hardware works are the reason we all use multi-GHz multi-core power-eating monster computers - just to do a bit of word processing."
  
  Hacky scripty languages like python and tcl are what form the backbone of silicon design/implementation. Without them you wouldn't have the performance increase seen in CPUs/GPUs over the last 25 years.
  
  EDA software on the other hand could do with some bottom-up rethink/redesign to better harness the potential of said silicon.
  
  1 1 Reply
4. Sunday 7th June 2020 12:40 GMT Paddy
  
  Look! someones lying on the internet!!!
  
  > "hacky scripty language’s like Python that pay no regard whatsoever to how the underlying hardware works"
  
  Read the groups frequented by the C'Python core devolopers and you would see your error. Read up about Timsort and you would find that those core develppers also look above the hardware at how the language and its libraries are used and optimise that too.
  
  You may suffer from a narrow view of "to optimise" not shared by all.
  
  2 2 Reply
Saturday 6th June 2020 11:44 GMT Richard Boyce

Fundamental problem

Businesses often regard external costs as irrelevant. For example, how much has been wasted by Microsoft because it's cheaper to produce inefficent products when it's the users who are paying for the megawatts of power and waiting for something to happen.

Even within a company, a manager can get rewarded if his department produces something quick and dirty for some other department to use. The costs are coming out of someone else's budget.

More competition helps, but we also need user education to accentuate the negative feedback, especially when mother nature is on the receiving end of planned inefficiency.

11 0 Reply
Saturday 6th June 2020 13:52 GMT RichardEM

The same old problem

It seems to me that what I read about the article, I didn't go behind the paywall, and many of the comment and reply's made are either saying or implying that the problem is not hardware, software or another specific thing. The problem is the same thing that I ran into when I was consulting that is: Ask the right questions. Many time the client would say we need to do this, but not really addressing the problem but the problem with the result of what they were presently doing.

I was constantly trying to find out what the root of the problem was so I could get them what they needed.

9 0 Reply
Saturday 6th June 2020 14:25 GMT Anonymous Coward

Python 3

Looks like the language could benefit from some behind the scenes tweaking in algos and also to make the most of the Hardware and OS that the program is being run on.

0 4 Reply
Saturday 6th June 2020 15:45 GMT DaemonProcess

optimize / optimise

There are after-market optimizing compilers for (even) Node, Python and Java - a language is just a language, GC or not. People just tend to use them in the standard manner (interpreted, JIT or whatever). A compilation stage could simply be added to a devops pipeline, if only people trusted their near-to-non-existent testing these days (a-hem iOS and Android).

I programmed one of the world's first prototype cash machines. It only had 256 12-bit words of RAM in magnetic core storage. Out of that I had to handle screen i/o, cash dispensing and comms. Obviously it had fairly limited functionality and there were separate i/o processors and hardware controllers to handle stuff but it's amazing what skills we have lost. For example had to use self-modifying code to save memory, so the screen output routine was essentially the same as the the cash dispensing and serial output loops, with a few changed words in the middle as required.

As for the comment about C being close to Assembler, there was a language in-between those, called NEAT-3 which was like assembler but with variables, it was a fun way to learn about instructions, stacks and algorithms. Also you get millicode and microcode but those are different subjects.

You are not a real programmer unless you remember when an assembler multiplication of 200x5 could be made faster than 5x200. But then again I did once know a programmer who's idea of a program was a C header followed by 200 lines of un-commented assembler...

7 0 Reply
1. Saturday 6th June 2020 18:03 GMT Charles 9
  
  Re: optimize / optimise
  
  "You are not a real programmer unless you remember when an assembler multiplication of 200x5 could be made faster than 5x200."
  
  Unless every little cycle counted (in a limited-resource environment, I'll grant you), the difference really wouldn't be all that great (if you take the shift-and-add approach, as both types of instructions are usually pretty cheap time-wise, you'd only need one additional shift-and-add--4+1 versus 128+64+8).
  
  4 0 Reply
2. Monday 8th June 2020 05:04 GMT G.Y.
  
  ASM e: optimize / optimise
  
  I heard that, when the law "thou shalt write COBOL" came out, lots of COBOL programs were written where line 1 was "enter assembler" and all else was assembly code
  
  3 0 Reply
3. Monday 8th June 2020 11:52 GMT Brewster's Angle Grinder
  
  Re: optimize / optimise
  
  Self modifying code was a lot of fun. But an absolute bastard to maintain. And lets face it: memory, even cache, isn't in short supply and writing to the code segment is a security nightmare.
  
  2 0 Reply
4. Monday 8th June 2020 13:29 GMT John Smith 19
  
  "I programmed one of the world's first prototype cash machines."
  
  Was that the IBM one with the one line dot matrix LEDs, or did were they still using a printed roll of instructions?
  
  But TBH I only know either by reputation.
  
  0 1 Reply
Saturday 6th June 2020 16:30 GMT iron

> the MIT researchers wrote a simple Python 2 program

At which point they lost all credibility. At least use a currently supported version of a language, not a very old and out of date one, and preferably a language that is actually up to the task put before it. Python is not a suitable language for matrix maths which is about all they proved.

3 9 Reply
1. Saturday 6th June 2020 18:44 GMT Anonymous Coward
  
  This car won't go! (key?)
  
  and you not reading the following paragraphs lost you all credibility. A starting point for them should not be the ending point for you, unless you really didn't want to know what the article said. Go read the article.
  
  8 0 Reply
2. Sunday 7th June 2020 15:01 GMT Anonymous Coward
  
  Don't get me started on language bloat. There are too many languages and there seems to be a new one every year claiming to be better and different than last year's language.
  
  1 0 Reply
Saturday 6th June 2020 22:39 GMT Alister

deader than corduroy bell bottoms

WHAT!

why did nobody tell me?

9 0 Reply
1. Sunday 7th June 2020 00:02 GMT John Brown (no body)
  
  Re: deader than corduroy bell bottoms
  
  You are an OU lecturer and ICM£5
  
  9 0 Reply
Sunday 7th June 2020 06:50 GMT Robert Grant

The code, they say, takes seven hours to compute the matrix product, or nine hours if you use Python 3. Better performance can be achieved by using a more efficient programming language, with Java resulting in a 10.8x speedup and C (v3) producing an additional 4.4x increase for a 47x improvement in execution time.

A "more efficient language"? It's not the language, it's the runtime. Run that in MyPy and see the difference between that and CPython. Any techies on the staff?

2 0 Reply
Sunday 7th June 2020 10:04 GMT Anonymous Coward

Oozlum Computing

Many years ago, when dinosaurs still roamed the planet, I recall a project using one of these new-fangled computers to improve an inspection stage that took a trained inspector four days to complete (on each part). It involved checking thousands of small (say 2mm) holes drilled in a metal ring. He had to check each was clear using a wire. There were inevitably a few that were not clear but returning to redial them was a lengthy and expensive operation. The engineers could accept a certain number of blocked holes for each 15deg sector. So poking the wire and then calculating acceptance (no electronic calculators then) took time.

With the computer, a small photoresists array and some early LEDs, the job was reduced to four hours in BASIC. Once the principle was proven and accepted as a working solution, the program was rewritten in assembler and the job reduced to 20min. Even then, it was recognised that any high-level language was inefficient in computer time.

I recall, many years later when the ZX81 came out, how it was possible to program in just 1k of RAM (the aforementioned works m/c had an enormous 32k). I gave up any pretence at programming 30 years ago, other than simple VB spreadsheets, but it never ceases to amaze me how big programmers manage to make simple programs nowadays.

My point - I reckon we can go a long way by looking at coding. We had the luxury of faster chips making it too easy fro too long.

Anonymous to permit flaming...

12 0 Reply
Sunday 7th June 2020 12:11 GMT Paddy

That's a wrap!

Those same MIT professors should progress to *wrapping* their orders of magnitude faster solution so it becomes simply callable from Python. That would then allow other scientists and engineers to benefit from superfast matrix multiplication with the ease of a Python function call. It's how peole get things done in, for example, data-science in simple "Python" without having to now the intricate details of all of the libraries they are using.

2 1 Reply
Sunday 7th June 2020 14:47 GMT Andy Non

Designing the right algorithm always helps

One of the first programs I wrote for my employer was on an Apple 2 as I remember back in the early 80's. At the time one of their programs was taking around 2 to 3 DAYS continuous processing and I got that down to around half an hour. The software had to reconcile invoice information for the accounts dept, (outstanding invoices against payments made as I recall). Essentially there were two very large sequential lists (in data entry order) to check off against each other. The guy who had written the original software worked his way down the first list and checked every item against the second list to see if it matched the invoice number, so the second list was being searched top to bottom thousands of times. It worked, but the poor algorithm wasted lots of processing time. My approach was to sort both lists first by invoice number, then proceed crabwise down both lists necessitating only one pass of each list. The extra overhead of doing the sorts first was massively outweighed by the subsequent fast comparison. I found the sort algorithm in an old Commodore Pet book - Shell Metzner. Used it quite a lot after that. This was back in the days of sequential files before databases came along.

5 0 Reply
1. Monday 8th June 2020 13:43 GMT John Smith 19
  
  "The guy who had written the original software worked his way down the first list and
  
  checked every item against the second list to see if it matched the invoice number,"
  
  Yup. That's the classic dumbass coding method in a nutshell.
  
  1 1 Reply
Sunday 7th June 2020 15:57 GMT nautica

Why all this thashing around seeking a suitable language?

"By understanding a machine-oriented language,‭ ‬the programmer will tend to use a much more efficient method‭; ‬it is much closer to reality".--Donald Knuth‭

He's talking about assembly language, folks. Can't get more efficient than that. And then...

"Simplicity and elegance are unpopular because they require hard work and discipline to achieve and education to be appreciated."--Edsger Djikstra‭

7 0 Reply
Sunday 7th June 2020 18:12 GMT nautica

The road to hell. It goes on and on and...

There is a basic, basic philosophical disconnect here, and one which is taking over the whole of engineering and all forms of technological design--COMPLETELY--; to wit:

ALL PROBLEMS CAN BE SOLVED WITH A SOFTWARE SOLUTION

Get that? ALL problems...even those problems where a century of history and impeccable safety dictates the use of triply-redundant hardware. Here we have a basic problem, dictated by the immutable laws of physics; and the now-prevalent mind-set (with legitimacy provided by none other than the now-highly-questionable authority known as "MIT") says, "No problem, mon! We'll fix that with software! We can fix anything with software".

Does the phrase "Boeing 737 Max" ring a bell, boys and girls? ...and MIT?

The only answer to this mentality is the famous quote of Wolfgang Pauli--

"This is so bad it's not even wrong."

5 0 Reply
Monday 8th June 2020 00:07 GMT 89724102172714182892114I7551670349743096734346773478647892349863592355648544996312855148587659264921

I wonder how long it will be before AIs make all programmers redundant

0 0 Reply
Monday 8th June 2020 00:10 GMT Anonymous Coward

Numpy

Anyone doing serious numerical computations in Python would be using numpy, which can call MKL and other optimised libraries. Also, you would probably use Fortran or C for the number crunching and use Python to glue it together. For real world applications, matrices are often sparse or have a special structure, which allows the use of specialised algorithms that can be orders of magnitude faster.

4 0 Reply
Monday 8th June 2020 00:14 GMT The Rest of the Sheep

Hal Hardenberg Rides Again

This article should sound familiar to anyone who remembers the DTACK Grounded newsletter.

Its Editor (nom de plume FNE) spent many pages extolling the virtues of assembler and efficient programming. I've been hiding from COVID for a while and lost track of time, but didn't realize I was back in the early eighties again. DTACK is archived at http://www.easy68k.com/paulrsm/dg/ should you desire a '60s electrical engineer's take on the "software efficiency doesn't matter, the chips will always get faster" argument.

2 0 Reply
Monday 8th June 2020 00:37 GMT Grumpy Rob

Horses for courses

One problem I've seen with software development is the old "when you've got a hammer everything looks lik a nail". Young programmers learn one language, and think that it's applicable to all problems they're given. So I've seen what should have been a simple web application run like an absolute dog because it used MEGABYTES of Javascript and MEGABYTES of HTMl to render a few simple and small tables. But the developer was using Java/Swing (I think) and some client side libraries that were HUGE. Who needs click-sortable columns on a table with typically three or four lines of data?? But the developer clearly didn't know any better.

Once you know a thing or two you can select Python for quick and dirty one-off jobs - it doesn't matter if it takes a few hours to extract/migrate data if you're only doing it once. While for a production task you may pick something more efficient and suited to the task - and (gasp!) actually do some thinking at the design stage.

One of my early jobs (more than 30 years ago) was writing the telemetry driver for a SCADA system that had three dual-CRT operator consoles. All written in assembler and fitted into 256k bytes of core memory.. including the OS. And the Interdata 32 bit mini had its performance measured in DIPS (dozens of instructions per second). As with a previous poster, when I sit waiting long seconds for a 4 page Word document to load on an i5 machine with 8Gb of RAM I just shake my head in amazement/disgust.

5 0 Reply
Monday 8th June 2020 06:17 GMT Anonymous Coward

"Better performance can be achieved by using a more efficient programming language, with Java resulting in a 10.8x speedup and C (v3) producing an additional 4.4x increase for a 47x improvement in execution time."

Well, "more efficient programming language" and Java don't really click. Java has always been shit and it is time everyone on the market realize this ...

0 0 Reply
Monday 8th June 2020 07:35 GMT John Savard

Oh, dear.

I had read claims that the design of the Python interpreter was so advanced, code written in Python ran as fast as compiled code. Apparently that was mistaken.

0 1 Reply
Monday 8th June 2020 07:49 GMT PaulVD

Amazing how many smart El Reg readers missed the point

The authors picked a really simple problem for which we have a lot of analysis and some very good solutions, and showed that a really poor algorithmic choice falls far outside the achievable frontier. No doubt they did a bit of searching over languages etc to find a really bad starting point.

But it is beside the point to argue that they should have used a modern BLAS library, a better language, and other optmisations that are obvious to all of us. They showed that there are design choices which make orders-of-magnitude differences to the performance of this very simple and well-understood problem.

But now, apply that to problems that are not well-understood and for which there are no conveniently pre-optimised libraries: the database structures from which you extract that complicated query, or the nonlinear pattern-matching algorithm, or whatever programming and software design task you get paid for. Can thinking more carefully about your fundamental approach to the data structures or the mathematics yield orders of magnitude improvements? Given that we can no longer count on major improvements in future processing speed, we will have to depend on improving our high-level thinking about data structures, algorithms, and suitable programming languages.

This is a very self-evident point, for which the authors have offered a correspondingly trivial example. My initial thought was that the article was not interesting enough to be publishable. But a surprising number of commentators have attacked the example and missed the underlying point, so perhaps the point is not as self-evident as it ought to be.

3 0 Reply
1. Monday 8th June 2020 16:47 GMT Anonymous Coward
  
  Re: Amazing how many smart El Reg readers missed the point
  
  "This is a very self-evident point, for which the authors have offered a correspondingly trivial example. My initial thought was that the article was not interesting enough to be publishable. But a surprising number of commentators have attacked the example and missed the underlying point, so perhaps the point is not as self-evident as it ought to be."
  
  Plus there are the fundamental limits that make truly tackling these feats a matter of engineering: budgets and deadlines. IOW, taking a few minutes may mean missing the deadline...
  
  2 0 Reply
Monday 8th June 2020 09:57 GMT Torben Mogensen

Dennard scaling

The main limiter for performance these days is not the number of transistors per cm², it is the amount of power drawn per cm². Dennard scaling (formulated in the 1970s, IIRC) stated that this would remain roughly constant as transistors shrinks, so you could get more and more active transistors operating at a higher clock rate for the same power budget as transistors shrink. This stopped a bit more than a decade ago: Transistors now use approximately the same power as they shrink, so with the same amount of transistors at smaller areas you get higher temperatures, which requires more fancy cooling, which requires more power. This is the main reason CPU manufacturers stopped doubling the clock rate every two years (it has been pretty much constant at around 3GHz for laptop CPUs for the last decade). To get more compute power, the strategy is instead to have multiple cores rather than faster single cores, and now the trend is to move compute-intensive tasks to graphics processors, which are essentially a huge amount of very simple cores (each using several orders of magnitude fewer transistors than a CPU core).

So, if you want to do something that will benefit a large number of programs, you should exploit parallelism better, in particular the kind of parallelism that you get on graphics processors (vector parallelism). Traditional programming languages (C, Java, Python, etc.) do not support this well (and even Fortran, which does to some extent, requires very careful programming to do so), so the current approach is to use libraries of carefully coded code in OpenCL or CUDA and call these from, say, Python, so few programmers would even have to worry about parallelism. This works well as long as people use linear algebra (such as matrix multiplication) and a few other standard algorithms, but it will not work if you need new algorithms -- few programmers are trained to use OpenCL or CUDA, and using these effectively is very, very difficult. And expecting compilers for C, Java, Python etc. to automatically parallelize code is naive, so we need languages that from the start are designed for parallelism and do not add features unless the compiler knows how to generate parallel code for these. Such languages will require more training to use than Python or C, but far less than OpenCL or CUDA. Not all code will be written in these languages, but the compute-intensive parts will, while things such as business logic and GUI stuff will be written in more traditional languages. See Futhark.org for an example of such a language.

On a longer term, we need to look at low-power hardware design, maybe even going to reversible logic (which, unlike irreversible logic, has no theoretical lower bound of power use per logic operation).

1 0 Reply
1. Monday 8th June 2020 16:58 GMT Charles 9
  
  Re: Dennard scaling
  
  But what happens when you get caught between Scylla and Charybdis: stuck with an inherently serial job that requires a lot of raw computing power BUT can't be parallelized? Or even just a job that is highly serial (like high-ratio compression, including video compression)?
  
  0 0 Reply
  1. Tuesday 9th June 2020 07:59 GMT Torben Mogensen
    
    Re: Dennard scaling
    
    Video compression is not really highly serial. The cosine transforms (or similar) used in video compression are easily parallelised. It is true that there are problems that are inherently sequential, but not as many as people normally think, and many of those that are are not very compute intensive. It is, however, true that not all parallel algorithms are suite for vector parallelism, so we should supplement graphics processors (SIMD parallelism) with multi-cores (MIMD parallelism), but even here we can gain a lot of parallelism by using many simple cores instead of few complex cores.
    
    But, in the end, we will have to accept that there are some problems that just take very long time to solve, no matter the progress in computer technology.
    
    1 0 Reply
    1. Tuesday 14th July 2020 15:00 GMT Charles 9
      
      Re: Dennard scaling
      
      "The cosine transforms (or similar) used in video compression are easily parallelised."
      
      Not if they're dependent on the ones BEFORE them, and the most efficient video codecs are INTER-frame, meaning you can't do the next frame until you do the one. This is why x264 didn't go multithreaded for a dog's age and even now takes approaches that appear to have tradeoffs in quality or speed.
      
      0 0 Reply
Monday 8th June 2020 11:31 GMT KBeee

You all missed the point

I can't believe that everyone commenting BTL has missed the whole point of the article! The article is purely there to denigrate those of us that choose to wear corduroy bell bottom trousers!

3 0 Reply
Monday 8th June 2020 11:52 GMT Marco van de Voort

looptiling

Before you start throwing in technologies that require a radical different approach, start with simple optimization like looptiling to optimize for cache effects.

1 0 Reply
Monday 8th June 2020 11:58 GMT Caver_Dave

Time budgets

I work in the certifiable, hard-realtime world now, but years ago had to write some echo cancellation and noise reduction code for a mobile phone producer.

It was probably the best defined project I have ever worked on and consisted of source sound files, an algorithm and the expected output.

The customer had working simulations in MathLab (or similar) and had auto-converted that to C code, but this was still far too slow for their time budget and spent variable times computing the answers.

My job was to take the algorithms and convert to assembler, in a manner so that all possible routes through the code took exactly the same number of clock cycles.

As you can imagine the algorithms were complex, but I managed it with just one nop statement, and that was in a rarely used branch. The executable was about 30% the size of the C code produced executable and always took 50% of the time of the fastest path through the C code.

That really was worth 2 weeks of my time for the customer.

The two most important decisions at the start of a project in my mind are always algorithm and language, whether that be a little bit of scripting for a web page, or a deeply embedded PID controller.

4 0 Reply
1. Monday 8th June 2020 16:19 GMT genghis_uk
  
  Re: Time budgets
  
  I wrote an assembler program for a telco line card and every path through the main loop had to time to exactly 1ms - there were a lot of potential paths.
  
  Eventually I had to print it out on 30ft of paper and crawl up and down marking loops with different colour pens. It amused the office staff walking past me in the corridor!
  
  Happy times :)
  
  1 0 Reply
Monday 8th June 2020 13:12 GMT John Smith 19

Hmm. 5nm iw 23 atoms wide.

IOW plenty of room still to go.

Although likely to be ~~ballsachingly difficult~~ quite challenging to get there.

0 1 Reply
1. Monday 8th June 2020 17:04 GMT Charles 9
  
  Re: Hmm. 5nm iw 23 atoms wide.
  
  Can't rely on the atom width at paths that small. Once you get that small, quantum phenomena come into play. Thus you have issues like quantum tunneling where subatomic particles (like electrons) suddenly appear on the other side of a barrier (which is a problem when the barrier in question is a transistor).
  
  2 0 Reply
Monday 8th June 2020 15:15 GMT Stephen Davison

Code efficiency

Java's multidimensional arrays are more efficient when the reference to the inner array is cached. Without that, most the computation is the program trying to figure out where the data was stored in memory with modulo functions. The loop orders and values are also not very efficient unless the compiler is supposed to sort that out, which means that we're more measuring differences in compilers.

The following optimised Java version took 42.6s on one core. It would likely speed up the code in most the languages and remove the compiler as a factor.

Python gets a lot of its speed up from compiled C functions like sort() but this isn't in play when it does some basic looping so it looks very bad in an example like this.

int size = 4096;

double[][] A = new double[size][size];

double[][] B = new double[size][size];

double[][] C = new double[size][size];

for (int i = 0; i<size; i++) {

double[] asubarray = A[i];

double[] csubarray = C[i];

for (int k = 0; k<size; k++) {

double[] bsubarray = B[k];

double asubarrayValue = asubarray[k];

for (int j = 0; j<size; j++) {

csubarray[j] += asubarrayValue * bsubarray[j];

}

}

}

0 0 Reply
Monday 8th June 2020 19:53 GMT Rol

One for the conspiracy cats

"Hey this is the biggest breakthrough of the century. 16 cores, running at 5Ghz, on a 5nm die. It's out of this world"

"Yeah, it took some time, but we aimed for the prize and made it"

"Our customer's are going to go bonkers with this"

"Err...well...not really"

"?"

"You see marketing has quite rightly pointed out, that giving the world this piece of kit now, is the equivalent of making everlasting lightbulbs. We'll sell millions of them and then that will be it, as every Tom Dick and Harry will have a machine that fulfils their every desire for many years to come."

"so what will we be marketing"

"Something that will still blow their minds, but obviously seriously cut down. a single core running at 150 Mhz with a 60Mhz bus"

"Yep. Still quite impressive"

"And every two years we'll double it up, so as to get people to upgrade. Instead of making millions we'll make trillions of dollars over the span of a few decades, as customers try to keep up to date"

"Mr Moore, you are a genius."

"You know. I predict our customers will probably see me as less of a genius and more of a prophet"

0 0 Reply
Monday 8th June 2020 20:16 GMT Billy Bob Gascan

Since I am bored and can’t go out and ride my bike in the rain I used XCode to write a C program on my iMac to multiply two 4086 x 4086 matrices of 64 bit floating point numbers. I used a complex, robust random number generating algorithm to fill the arrays. This takes 17 seconds. Then I multiplied the two arrays to fill a third array. This takes 0.102632 seconds. You have to wonder how the authors could have gotten such crappy results even using crappy languages.

0 0 Reply
Monday 8th June 2020 21:06 GMT Someone Else

I'll be here all week...try the veal...

The authors stress the need for hardware makers to focus on less rather than Moore.

Ba-DOOM-tish!

0 0 Reply

POST COMMENT House rules

Not a member of The Register? Create a new account here.

Other stories you might like

Google Cloud chief is really psyched about this AI thing

Cloud Next We're on a highway to ML

PaaS + IaaS 9 Apr 2024 | 6

VMS Software prunes OpenVMS hobbyist program

Vintage OS editions go the way of the dodo as VSI cranks up exclusivity

Applications 9 Apr 2024 | 54

YouTube now sabotages ad-blocking apps that stream its vids

EFF lambastes latest 'lazy and deliberately malicious move'

Applications 16 Apr 2024 | 66

Chrome Enterprise Premium promises extra security – for a fee

Cloud Next Paying for browsers is no longer a memory from the 1990s

Security 10 Apr 2024 | 33

Debian spices up APT package manager with a dash of color, squishes ancient bug

2.9 gives a taste of what's to come

OSes 18 Apr 2024 | 32

OpenAI's GPT-4 can exploit real vulnerabilities by reading security advisories

While some other LLMs appear to flat-out suck

AI + ML 17 Apr 2024 | 6

What's up with AI lately? Let's start with soaring costs, public anger, regulations...

'Obtaining genuine consent for training data collection is especially challenging' industry sages say

AI + ML 15 Apr 2024 | 12

Microsoft lifts years-old compatibility hold for Windows 11

It probably wasn't only sound driver problems that kept users away

OSes 15 Apr 2024 | 12

Microsoft to use Windows 11 Start menu as a billboard with app ads for Insiders

This wasn't what most had in mind when Redmond promised to make the feature 'great again'

OSes 15 Apr 2024 | 111

How to coax ChatGPT into making better predictions: Get it to tell tales from the future

'Something is stopping it, even though it clearly can do it'

AI + ML 14 Apr 2024 | 16

OpenBSD 7.5 locks down with improved disk encryption support and syscall limitations

The most secure Unix-like OS to date?

OSes 12 Apr 2024 | 49

British watchdog has 'real concerns' about the staggering love-in between cloud giants and AI upstarts

Billions in investment? Yeeeah, right – looks more like ensuring only select few developers thrive

AI + ML 12 Apr 2024 | 4

The Register Biting the hand that feeds IT

About Us

Our Websites

Your Privacy

Situation Publishing

Copyright. All rights reserved © 1998–2024