back to article NASA missions are being delayed by oversubscribed, overburdened, and out-of-date supercomputers

NASA's supercomputing capabilities are not keeping pace with the latest technology developments, and are "oversubscribed and overburdened," causing delays to missions that are sometimes addressed by teams acquiring their own infrastructure. The above are some of the findings of an assessment [PDF] of the aerospace agency's …

  1. LJFox
    Joke

    Eye opening report

    This report will be an eye opener for threat actors who hadn't considered NASA as a potential target.

    1. Billy Twillig

      Re: Eye opening report

      “ Sorting out security is another item on the tiger team's to-do list..”

      They are likely already in the gates.

  2. Anonymous Coward
    Anonymous Coward

    Unknown scheduling practices or assumed higher costs

    "Stakeholders told us that while they know NASA has HEC cloud computing options, they were hesitant to use them due to unknown scheduling practices or assumed higher costs."

    I fail to follow the logic. Surely it shouldn't be a problem for departments to allocate time on big-iron nor have to pay for it themselves.

    1. An_Old_Dog Silver badge

      Re: Unknown scheduling practices or assumed higher costs

      I think the "HEC cloud computing options" referred to in TFA were commercial serivces such as AWS, Azure, etc., vs NASA's in-house supercomputing facilities.

    2. Anonymous Coward
      Anonymous Coward

      Re: Unknown scheduling practices or assumed higher costs

      Surely it shouldn't be a problem for departments to allocate time on big-iron nor have to pay for it themselves.

      You're fogetting Opex Vs Capex.

      If you buy your own HPC capacity, you can (almost) use it for free for the lifetime of the kit. If you use a central resource, you'll be charged for its use. As a researcher, you'll get a fixed grant for a certain amount of time. Once your grant runs out, you'd no longer be allowed to use the central HPC resource as you have no money to pay for it.

      I work in HE and I see this all the time. Researchers would rather pay for a cheap NAS that they can continue to use for a decade rather than paying for a central storage system that is high-available, backed up, protected by firewalls and anti-ransomware protection, upgraded & updated as necessary and supported by staff who know what they're doing.

      1. Richard 12 Silver badge

        Re: Unknown scheduling practices or assumed higher costs

        Of course they do, because the central storage system is not fit for purpose.

        Not because of the technology or the support, but because the way it is administered is totally incompatible with the way it is actually funded.

        1. Yet Another Anonymous coward Silver badge

          Re: Unknown scheduling practices or assumed higher costs

          Or the central storage system charges more in grant overhead than it would cost you to buy your own NAS

      2. Missing Semicolon Silver badge

        Re: Unknown scheduling practices or assumed higher costs

        Central storage went so well for KCL, didn't it? I seem to remember that the users were still being told not to use local storage even after the massive lossage?

  3. StargateSg7 Bronze badge

    Do you mean to tell me that I have more GPUs and more 64-bit Floating Point horsepower in my basement than NASA does in its entirety? WOW! That's a bit of a shock to hear! I even have the old AMD GPU stuff noone wanted that was fully written down that is literally worth $0.00 in accounting terms AND ZERO DOLLARS in Ebay sellability terms and YET I still have more computer power than they do? WTF? and DOUBLE WOW!

    Can't NASA just go on eBay and buy 1000 of the old AMD Instinct-25 GPU cards for $150 each and simply do 16-bit half-precision floating points or 32 bit Single-precision FP or 32-bit integer OPS to get the GPU power needed for their space systems? That's 20 PetaFLOPS of horsepower right there!

    V

    1. Francis Boyle

      It's just one site

      and each of those GPUs probably has many times more computing power than the junk in your basement.

      1. StargateSg7 Bronze badge

        Re: It's just one site

        I counted! I have TWO PETAFLOPS in my basement of 32-bit Floating Point calculations worth of GPU horsepower (i.e. 11.6 TeraFLOPS per GPU at 32-bit and 5.6 TeraFLOPS per GPU at 64 bits) and I have 172 of them GPU's so it ain't total junk. It's just unsellable on eBay or anywhere else due to changes made and it's all fully written-down by accounting. My workplace has a SHIPLOAD MORE of number-crunching horsepower since they make their own in-house-designed combined-CPU/GPU/DSP/Vector chips but I only get to use those for about two to four hours on Monday starting at 1 am, so I have to make do with a stack of GPU racks at home. I actually have more GPU horsepower than what the major nuclear research labs had in 2004 (i.e. Lawrence Livermore National Laboratory)! So if they can simulate nuclear weapons explosions using only TWO PETAFLOPS, I think I too can do some SERIOUS materials sciences engineering on TWO PETAFLOPS at home!

        I'm just under the limit for our local electrical power supplier (43,000 Watts) for household current for 240V at 200 Amps (maximum is 48,000 watts!) so I only run the system at full power when I need to or I hook up the propane generator which is rated for 75 KW prime power! (i.e. it's basically a detuned-for-longevity 125 horsepower V4-engine super-quiet and super-environmentally-clean propane generator!) and run it off some BBQ bottles exchanged at the local petrol station!

        V

        1. Roland6 Silver badge

          Re: It's just one site

          You could investigate becoming a GPU provider to NASA and associated researchers…

        2. bigphil9009

          Re: It's just one site

          You counted every single FLOP? That must have taken AGES! Seriously, who is upvoting this shite?

          1. StargateSg7 Bronze badge

            Re: It's just one site

            I can run various "Standardized Sieves and Floating Point Convolutions" which STRESS the GPU's greatly at 16 bits, 32 bits and 64 bits per Floating Point Operation in just a few hours. From there I get a large table of statistics for all the major math operations capable on this GPU array.

            I have also designed custom versions of various runtime tests that work on 16, 32 and 64 bit Signed and Unsigned Integers plus my own custom code which stresses out the video and audio filtering, compression/decompression/scaling, raytracing/graphics rendering and other intense imaging operations. I have a base Real-World Task-oriented BENCHMARKED GPU horsepower of TWO PetaFLOPS at 32-bits per Floating Point, ONE PetaFLOP at 64-bits and FOUR Peta-iOPS for 32 bit Signed Integer Operations. For 16-bit Integer operations, I have EIGHT Peta-iOPS which lets me do fancy 16-bits wide advanced Multi-State BOOLEAN operations that have degrees of truth in them used in powerful Artificial Intelligence applications such as Rules-based Expert Systems, Diffuse/CNN Neural Nets, Stable/Unstable Diffusion for image creation, and Large Language Models.

            AND .....if I want to, I can now design, simulate and build Nuclear Weapons of ANY SIZE from 10 KT up to ONE Gigaton so I can now become D. Evil and stick my pinkie finger up my nose all day long as my well-paid minions and henchmen do my dastardly dealings worldwide while i get to extort TRILLIONS of Dollars from the world economy and then invite Mariah Carey* into my life and boudoir!

            Meh!

            V

            * Mariah has got what I like and it ain't Electrolytes!

            V

            1. bigphil9009

              Re: It's just one site

              Big Wow

    2. An_Old_Dog Silver badge

      Faster than Floating Point Math ...

      ... is scaled integer arithmetic. It's a Forth (programming language) thing. The down side is that you have to know in advance the ranges of values your variables will having. But the results ... I was amazed at seeing a Commodore-64 fly at drawing Hilbert curves and Sierpinski curves (fractal patterns) in color.

    3. Scene it all

      I just looked up my University's supercomputing center and it has more capacity than NASA's HEC. All Nvidia Volta V100s.

  4. An_Old_Dog Silver badge

    Mind the GPU Gap

    Identify technology gaps, such as GPU transition and code modernization, essential for meeting current and future needs and strategic technological and scientific requirements

    There is too much goddamned churn, thrashing, FUD, and faddism going on in computing. There are too many people arguing, "We need to go this way!", "No, we need to go that way!", "The future is over here!", etc. Computing is littered with the discarded husks of once highly-touted methodologies, computing architectures, and programming languages. Remember flowcharts? Flowchart-generating programs? HIPO charts? Prolog? Expert systems? Simula? The Transputer? The "waterfall method"? Vector processing? Transmeta? NSP? BeOS? iAPX-432? RISC? Smalltalk? Structured walkthroughs? &c.

    And what is all this computing which NASA wants, for? They put men on the moon, and brought them safely back to Earth, using less collective computing power than what's available in a smartphone.

    1. FIA Silver badge

      Re: Mind the GPU Gap

      The "waterfall method"?

      Trust me, that hasn't gone away. :(

      Vector processing?

      That's what GPUs do.

      RISC?

      RISC has pretty much (sort of) won.

      Smalltalk?

      Which (arguably) beget other OO languages like C# or Java (or the even more similar message passing ObjectiveC).

      You seem to be saying because we did a thing once we should never strive to improve?

      Is society to be denied the advances since the 60s? Many of which have come about precisely because of the increase in the availability of computing power. We may have put men on the moon back then, but I've got a much better quality of life, and a much better chance of living longer into old age, and doing that whilst being fairly healthy too. This is all thanks to improvements in technology.

      1. An_Old_Dog Silver badge

        Re: Mind the GPU Gap

        (Effectively) "striving to improve" means rationally evaluating and testing potential-improvement X, rather than unthinkingly rushing toward it like a herd of mindless cattle. Marketers seem quite effective in starting stampedes.

    2. Not Yb Bronze badge

      Re: Mind the GPU Gap

      Simulating engine internals fast enough to create better engines is just one possible use of "too many petaflops to waste on the moon". Detailed aerodynamics analysis, weather prediction, trajectory optimization for some of the more ludicrous trajectory ideas that are currently "too much work to try but might be better"

      Lots of useful things to try...

  5. Pascal Monett Silver badge

    "The Space Launch System team alone spends $250,000 a year"

    Okay, this is armchair-general level, but if every team spends a quarter of a million per year, I think they could pool together and buy themselves a nice upgrade, no ?

    1. Version 1.0 Silver badge
      Happy

      Re: "The Space Launch System team alone spends $250,000 a year"

      I don't work with any NASA engineers now but many years ago I did, and it was clear how they got everything working.

      Unlike Microsoft they never just implemented an "update" everywhere, thinking it would work ... they ALWAYS spent a lot of time working with everything to verify that any potential changes would work, although their basic designs were always sophisticated initially to try and be reliable. But they never assumed that any new design was perfect, they always worked hard in every part of the environment to verify that they had something probably functional.

  6. Korev Silver badge
    Boffin

    > NASA's Advanced Supercomputing facility, for example, has just 48 GPUs alongside its 18,000 CPUs.

    Do they actually have code that benefits from GPUs?

    1. Richard 12 Silver badge

      Probably not

      As NASA generally need answers to far higher precision than the half float (or less) that most GPUs are effective at.

    2. Anonymous Coward
      Anonymous Coward

      Numbers are flat out wrong or ancient.

      18,000 cpus and 48 gpus is flat out wrong. probably meant nodes (not cpus), and gpu's even before the recent augmentation was more than 48 (but still not many).

      Go here for actual node/core/gpu counts: https://www.nas.nasa.gov/hecc/

      1. JudeK (Written by Reg staff)

        Re: Numbers are flat out wrong or ancient.

        These numbers are pulled directly from the audit, which was released three days ago. See page 17 of https://oig.nasa.gov/docs/IG-24-009.pdf.

        Thanks

  7. karlkarl Silver badge

    So I am assuming the NASA GPU Hackathon is just a ploy to bring your own GPUs so that they can steal them?

    https://www.nas.nasa.gov/hackathon/

    1. An_Old_Dog Silver badge

      GPU Hackathon

      Hmm ...

      1. "U.S. applicants only".

      2. Participants will get to use NASA's "Cabeus" supercomputer, which, when fully-configtured, will have 2,428,928 GPU cores. That's a hell of a lot more than "48". (see: https://www.nas.nasa.gov/hecc/resources/cabeus.html)

      3. Team selection is partly-based on apps which will be useful to NASA. How would we know in advance what they consider useful and relevant?

      4. Their "PRE-EVENT TRAINING INFORMATION" looks potentially-useful, even if you don't participate.

  8. Kev99 Silver badge

    Let's see. The IG thinks NASA should centralise all its HEC infrastructure and rely on the bunch of holes held together by vapor for storing mission critical, and in may cases, life dependent data. Let the missions talk together AFTER a mission is finished. I for one don't want to become Major Tom because some rectal orifices cramped NASA's style.

    1. Roland6 Silver badge

      > The IG thinks NASA should centralise all its HEC infrastructure

      If this was the UK, that would make sense, as the IG would be a political body (ie. Too many conservatives with outside business interests and mates), as then the centralised facility could then be sold off and run by Microsoft…

      1. VicMortimer Silver badge

        That's exactly how it works in the US too.

  9. ecarlseen

    Let's just end the giant simulation circle-jerk called NASA

    NASA loves to run "experiments" on computers and not build actual spacecraft. Except they aren't even experiments, because experiments only give reliable data when run with real stuff in the real world. This is how they spent many billions of dollars on a space shuttle replacement and didn't even wind up with a finished design spec, let alone an actual design. Even then, the space shuttle itself was cool and fun, but it had an obscenely bloated budget and despite NASA's extreme risk averseness still suffered a ~2% mission fatality rate. Computer simulations only find the problems you remembered to program them to find, assuming they have no bugs or design flaws or uncaught hardware malfunctions (are we far enough into fantasy land yet?). Real-world experimentation finds things like o-rings failing because they're being used outside of their rated temperatures.

    Meanwhile, SpaceX is blowing up prototypes and winding up with finished, validated designs at a very tiny fraction of the cost (<2%) of NASA's completely wasted money. SpaceX is putting literal tons of stuff into space relatively cheaply and has become so reliable that most of their launches are now boring (except for the ones that we expect to blow up; those are fun). No, the Falcon 9 rockets are not as sexy-looking as a space shuttle. They're utilitarian. They make their deliveries and come back for re-use without much fuss. SpaceX is succeeding at building the Honda sedans and Mercedes delivery vans of space flight, while NASA is failing to build custom hypercars. SpaceX is where we want to wind up with for human space travel: cheap, reliable, and uneventful.

    1. ecarlseen

      Re: Let's just end the giant simulation circle-jerk called NASA

      To put my post above in perspective, as of yesterday, March 15, 2024, SpaceX has successfully launched and landed a Falcon 9 rocket an average of once every three days in 2024 so far (25 missions in 75 days, all of them successful). This does not include the Starship launches, which are explicitly considered to be disposable prototypes.

      How freaking insane is that?

      https://x.com/SpaceX/status/1768796963041116604

      1. FIA Silver badge

        Re: Let's just end the giant simulation circle-jerk called NASA

        This does not include the Starship launches, which are explicitly considered to be disposable prototypes.

        Even then, with their third test mission, they manage to do what all other rocket companies routinely do.

        i.e. reach a stable insertion orbit with the loss of both stages.

  10. martinusher Silver badge

    Buzzword Bingo, Anyone?

    The material quoted in the article is an object lesson on how to write a lot without saying anything in particular except for the underlying message "Give us more money". Take this beauty, for example:-

    "Identify technology gaps, such as GPU transition and code modernization, essential for meeting current and future needs and strategic technological and scientific requirements;"

    Maybe its just me but as soon as anyone uses a term like "Tiger Team" I know they're full of it!

    1. An_Old_Dog Silver badge

      Re: Buzzword Bingo, Anyone?

      Just an impression, but it seems the author of the report was trying to look good to some Big Cheese, so they started spouting buzzwords and tap-dancing. Bureaucratic self-promotion process #1:

      (a) "Find" a problem;

      (b) Suggest, in suitibly-vague language, how to "fix" the found problem;

      (c) Collect the "credit" for having done all this, and move on to a better, higher-paying job before anyone figures out, "Hey. This is all bullshit. As written, it is meaningless and/or un-implementable."

    2. FIA Silver badge

      Re: Buzzword Bingo, Anyone?

      Maybe its just me but as soon as anyone uses a term like "Tiger Team" I know they're full of it!

      Or they're good at their job?

      There's a massive business speak feedback loop. Most people know it sounds bollocks, but they think they should talk like that, so they do.

      Maybe I misunderstood, but the recommendations seem to be 'stop wasting money and create a centralised team to best understand requirements across the org and implement them'. Sounds like sensible (if obvious) advice.

      Sometimes also you need someone else to say that, to give the necessary 'Ooomph' to the solution.

      One of the best contractors I ever worked with told me that most of his job is listening. He'd go in to a company and listen to the staff there, as they understood the systems and the operational problems with them. He would then end up tidying up and presenting the staffs solution to the people who'd hired him.

      He would get listened to when regular staff wouldn't as he was costing a lot of money per day, and as long as he didn't try and take credit for other peoples ideas it worked well, as the problem got solved, but also the existing staff felt like they'd finally been listened to.

      It's amazing how many problems in business are human problems, not technological. Sometimes working out how to 'human' someone into doing what you need them to do is the best solution.

  11. pnico

    OK, so their computers are old and slow. That’s probably annoying.

    Is this news of their plan to (formulate a plan to) assign a task force to (start thinking about how best to) fix all the security holes reaching us before they fixed all the security holes, though?

    1. FIA Silver badge

      Are you mad?

      Who does that without first appointing the task force selection committee committee??

      I knew you modern app developers were louche, I didn't realise you were all reckless too.

  12. Anonymous Coward
    Anonymous Coward

    From Pioneers to out of date?

    Let me guess.

    Management took over from engineers ?

    Bet they outsourced shi* loads to to some idiot company on an "approved" list.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like