* Posts by HuBo

1126 publicly visible posts • joined 20 Nov 2023

Page:

Unpacking the deceptively simple science of tokenomics

HuBo Silver badge
Windows

Sounds good to me

I can see how 'tokenomics' is used here to mean the 'economics of token generation', where tokens are the output of AI (so-called) systems, and so TFA looks at 'the economics of AI inference' from the POV of power use and interactivity (responsiveness to user(s)) -- a tradeoff for Pareto probing.

The first chart is neat, dividing 3.5M tok/s/MW by 10 tok/s/user (1st data point) gives 3 Watt per user at 'glacially slow' pace, while for the 6th point one gets 500k tok/s/MW at 80 tok/s/user that gives 167 W/user at a more usable goldilocks porridge rate. If these figures'd scale down to locally-run LLMs, one could imagine running them on anything from smartwatches to workstations (rather than datacenters) -- if they were found to be useful (of course).

The rest of TFA looks at how this varies with open source software, multi-GPU setups, mixture-of-experts, FP4 vs FP8, and so forth, which is interesting. I imagine throwing in model corsets and related special skills would further mold the curvy figures strutted on these efficiency displays ... quite informative overall imho (technically).

Intel backs SambaNova's $350M bid to challenge GPUs in AI inference

HuBo Silver badge
Windows

Nice tech

I like how dataflow is positioned in-between Von Neumann that completely separates memory and compute, and in-memory compute that completely merges them. I imagine that reconfigurable dataflow units (RDUs) help map the hardware better to the computational graphs than non-reconfigurable ones. A lot of performance gains in Von Neumann came from advanced front ends that rejigger the portion of the computational graph the CPU sees at any moment to split it over its multiple execution ports (ILP) and reap higher IPC. Dataflow should allow the compiler to do that instead, for some very high performance, even in non-matrix-vector types of workloads.

Obviously one needs both proper hardware and a great compiler for this to work right.

And while baking specific computations into a non-reconfigurable ASIC (eg. Taalas) can provide great perf and efficiency, it is a bit less flexible. Reconfigurable spatial data flow (eg. Efficient Computer) sounds interesting, but not as mature as of now I think. I wonder if Intel's rumored upcoming 'Unified Core' architecture could have something to do with any of this (or if it's just trying to merge P- and E- cores)?

AI can predict your future salary based on your photo, boffins claim

HuBo Silver badge
Windows

Re: These fools need a right trick-cyclist

I hear you, but I'm not sure where you're deriving that perspective from. Here's what they claim to do (page 9 of their 'paper'):

"we address the important question of how personality traits extracted from faces predict labor market success"
In other words, they believe that fairy tales and Hollywood movies, with typecast play actors, are reality rather than entertainment and fiction ...

They dig themselves in deeper on page 12 by claiming to "contribute to a large psychology literature that links facial attributes to personality" which is shadowy pseudoscience at best, obsessed with such things as 'facial symmetry' and 'facial width to height ratio'. They might as well contribute to mentalism and have AI (so called) tell me what's that thing in my purse I'm thinking of ATM, based on a photo of my dunce!

Worst of all though is their conclusion (page 39) that with "adoption of artificial intelligence [...] the insights from this study [...] highlight how the ability to measure personality at scale [from a face photo] can open new avenues [...] where individual differences [...] influence economic and social outcomes". WTF! </puke>. WTF! </puke>. WTF! </puke>.

Take that 'study' (so-called) behind a barn and put it out of its misery already!

HuBo Silver badge
Windows

Re: These fools need a right trick-cyclist

Well, the way I see it, their message is 'AI is so smart it can predict your future income from just your photo'. It is a stupid statement to convey IMO, both for the reason you state (we already know how looks unfortunately tend to affect outcomes, against aptitudes, at least temporarily) but also because it presents AI (so called) as some sort of all-powerful oracle of determinism, that here tends to just confirm and reinforce face-based stereotypes and biases.

It is a juvenile grade-school level affair representative of what we strive hard to ensure our kids don't get ensnared in at an early age, so they respect others not for the shallowness of their looks, but for the depth and quality of their characters (if any). Looks are so readily modified in a flash, through makeup, hairdressing, nail salons, tanning booths, contact lenses, dieting, plastic surgery, and other body modifications that may be as fleeting as youth, how would any scientist (so-called in this case) ever premise a 'study' on the notion that one photo could predict a lifetime of worth?

This 'paper' remains ridiculous and idiotic pseudoscience in my book. Looks can be (and are) deceiving. AI (so called) has no special power to infer future income from face photos, no more than it can predict future crime from photos (evidently).

HuBo Silver badge
Windows

These fools need a right trick-cyclist

This is such a naïve premise I don't see where it's going to see peer-reviewed publication except in some thankfully defunct 'Annals of Eugenics (1925-1954)' type of rag. Neither the TFA-linked 'a paper' NBER draft Working Paper, nor the UPenn linked SSRN (under 'investigate') constitute peer-reviewed publication ("SSRN is not a journal, and we do not peer-review content").

The 'paper' is written by folks with BSs, MSs, and PhDs in Economics, Business, and Finance (and one A.B. Appl. Math.) but none of the psychological expertise needed for a serious study of how personality traits (their claimed 'Big 5') may relate to photographs of faces -- they don't. Also, while they cite the TFA-linked 'a 2020 Scientific Reports paper' to justify their approach, they completely ignore the critique provided in the linked 'a 2024 paper' -- thereby exhibiting a total lack of thoroughness in this sensitive topic.

They might claim their objective was to 'inform regulatory discussion' but the words 'inform' and 'regulatory' occur nowhere in their 'paper'. We don't need this type of junk 'science' anywhere on this planet ATM, physiognomy, craniometry, phrenology, racial profiling, all disguised as 'machine learning' to make it appear modern. What's next, how to use LinkedIn profile photos for the elimination of mental defect by selective sterilization and euthanasia?! </worst nausea ever>

Positron: we don’t need no fancy HBM to compete with Nvidia’s Rubin

HuBo Silver badge
Windows

Sounds reasonable

I'd say this ties in nicely with recent interest for dataflow approaches to compute by major players (AMD aqui-hiring Untether, Intel buying SambaNova, Nvidia buying Enfabrica and reverse aqui-hiring Groq, OpenAI striking a deal with Cerebras, ...).

The challenge seems to be that the 22 TB/s of peak memory bandwidth in chips like Rubin are not enough to feed their 1.2 ExaFlop/s of compute appetite at FP8, resulting in some slowdown in number crunching throughput. Cerebras' dinner plates (for example) seem to provide a better match by distributing the compute more evenly and broadly throughout the memory that feeds it (and vice versa), like many small mouths rather than one huge one (or many stomachs like cows, ostriches, the Baird’s beaked whale).

As long as the workload lends itself to this type of distributed processing then it looks like a good way to deal with memory access bottlenecks imho, and wasting less juice in the process.

'Ralph Wiggum' loop prompts Claude to vibe-clone commercial software for $10 an hour

HuBo Silver badge
Windows

Interesting stuff

Apparently, OpenAI's Codex has an internal agent loop like that, that iterates over code production (an inner Ralph) (spotted by Benj Edwards). This begs a couple questions, namely doesn't claude-code have a similar inner loop, and would that not make the external bash loop somewhat redundant? And is updating of the prompt needed (in outer loop) to get an eventually satisfying output (Huntley seems to both suggest so, and not, simultaneously)?

But overall it's interesting that they're iterating over AI (so-called) tool application to refine outputs towards a (hopefully) convergent quality output (solution). It's a technique used at much finer scales (floating point ops) to solve PDEs approximated by linear algebraic matrix-vector systems for example (Richardson, Jacobi, Gauss-Seidel, SOR, ...) as well as optimization and inverse modeling (Newton method, Levenberg-Marquardt, ...) among others. It made me think of Sandia's iterative solution of PDEs using near-memory neuromorphic compute with proximal-only interactions, and that Richardson and Jacobi could be well-suited to that, even without spiking (i.e. plain-jane non-neuromorphic in-memory compute approach).

Quite stimulating imho (and with the crunch of puzzling bits) ...

Sandia boffins let three AI agents loose in the lab. Science, not chaos, ensued

HuBo Silver badge
Windows

Or maybe LLMs are less-supportive of human reasoning activities than the tools used here? Kingma and Welling saw VAEs as 'a principled framework for learning deep latent-variable models and corresponding inference models' (so combining learning and inference as in AI concepts). They suggested it's a bit like PCA but 'applies to a much broader class of continuous latent variable models' (eg. nonlinear ones and high-dimensional search spaces) -- whatever that means (ahem!).

The Sandia folks also refer to their Bayesian optimizer as an 'active learning agent' (AL agent), and their equationizer is a 'neural network-based equation learner' (nn-EQL) ... so istm they're operating in some AI/biomimetic space rather than more conventional determinism iiuc.

What's interesting though is it seems they had dug themselves into a sort of Fourier momentum-matching (grating) 'human intuition' hole back in their '2023 paper' (TFA link) and this here VAE+AL-agent+nn-EQL SDL helped them dig themselves back out by considering also 'gradient of momentum' effects (lens) in their dynamic steering of resonant metasurface quantum dot emissions (OMG!) -- results in Fig. 3 with lens curvature as x-axis, grating order as y-axis.

It seems they combined highly task-specific tools to assist their integrated analysis, rather than messing about with some everything and the kitchen sink jacks of all trades (masters of none) AGI/SuperIntelligence nonsense, which is quite sensible imho.

Bill Gates-backed startup aims to revive Moore's Law with optical transistors

HuBo Silver badge
Windows

Re: Micron scale ... transistors

The optics of their whitepaper's Table 1 are enlightening on this imho, with 52x the TOPS/mm² of an Nvidia B200 despite the coarse litho, and 39x the compute efficiency.

I wonder if their 'optical systolic arrays and metasurface tech' could work with multiple wavelengths simultaneously for extra parallel oomph.

Nvidia leans on emulation to squeeze more HPC oomph from AI chips in race against AMD

HuBo Silver badge
Windows

Re: DGEMM vs. vector FMA (attempt at) clarification

Yeah, and inasmuch as Ozaki's "shared-space splitting" approach essentially uses low-precision integer DGEMM to compute an FP64 vector FMA, it may end up ok (eg. Figure 2 and Algorithm 1 of their 'paper' ← TFA link). I expect it similarly uses integer DGETT (tensor) to compute an FP64 DGEMM ...

In a way, if all you have are 18-bit integer multiplier hardware units in an FPGA (for example), you perty much have to resort to a method like that to implement FP64 ops, hopefully with a decent timing closure.

I wonder how well it does on computing the FP64 vector tanh() and logistic sigmoids of LLM activation, sqrt(), Lennard-Jones potentials of molecular dynamics, and other such nonlinear functions (eg. through Taylor series and related multi-step methods). Would hardware FP64 units display an interesting advantage there (or not)?

Artificial brains could point the way to ultra-efficient supercomputers

HuBo Silver badge
Windows

in-Memories of a FEM fatale

It'd be cool to see this approach tested on Blumind's (Canucks) 100 billion neurons (1 quadrillion synapses) all-analog 12 Watt brain chip imho. Especially seeing how the Sandia folks "associate a small population [eg 8-16] of recurrently connected neurons with each mesh node" -- so the Blumind could help validate large-mesh scaling of their tech ...

Also interesting that Aimone-Theilman's "published" (TFA link) paper refers quite a bit to a 2013 Franco-Portuguese (Centre for the Unknown) Open Access PLOS article that focused on motor-control ODEs, particularly some "2D arm controller" (position and dynamics). Aimone-Theilman augmented each of the Franco-Portuguese "neuron with an additional state variable that integrates the local residual error" which eliminated "steady-state error", and was key to accuracy when solving their linear model elliptic PDEs. Very nice!

Most impressive (to me) is they figured how to solve a PDE while their "neurons synapse only with their nearest neighbours in the mesh", as needed for success of the in-memory compute architecture in this field (per TFAᖖ). It'll be nice to see how their approach fares in time-dependent (parabolic/hyperbolic) and nonlinear PDE problems. And also if it can be adapted to non-spiking in-memory compute solution of PDEs (or is spiking fundamental to making this work?). Fascinating stuff!

(ᖖ by comparison, linear algebraic matrix-vector solution methods, direct ones at least, would require filling up of the whole reverse Cuthill-McKee re-ordered matrix's band, which implies numerical connection to non-neighbor nodes afaict)

HuBo Silver badge
Pint

Re: The universe IS analogue - … - it simply IS, immediately and without processing delay

Cool links!

Every conference is an AI conference as Nvidia unpacks its Vera Rubin CPUs and GPUs at CES

HuBo Silver badge
Windows

How now green cow

It's impressive how their FP64 DGEMM ("emulation" in TFA), at 200 TF/s on Rubin, is 6x faster than the chip's native FP64 (33 TF/s), especially seeing how their November paper on Ozaki reported just a 2.3x speedup on Blackwell (A Riken version of Ozaki is on github).

I guess the approach yields FP64 perf under emulation that's roughly half of FP32, and adding that to the native perf gives the total. For the GB200 that'd be (160 TF/s)/2 + 80 TF/s = 160 TF/s of total FP64, which is twice the native FP64 oomph iiuc.

But then, to get 6x on Rubin, one probably needs to use both native FP32, and emulated FP32, together, to emulate FP64 (and add that to native FP64?). Total FP32 looks to be 400 TF/s already, yielding a potential 200 TF/s for emulated FP64, without adding native ...

It'd be nice to get some clarity on this from Nvidia imho (eg. could/should native FP64 be added?), and also why 4,000 TF/s of FP16 turns into just 400 TF/s of FP32 SGEMM (or 270 TF/s if FP32 native is already 130 TF/s). Inquiring minds ...

Everybody has a theory about why Nvidia dropped $20B on Groq - they're mostly wrong

HuBo Silver badge
Gimp

Re: other options

Yeah ... $20B is a bit much for a reverse acqui-hire when Intel managed to 'ambush' SambaNova's unrivaled full-stack AI for just $1.6B (or nearly so) ... suggesting masterful courtship.

And in the same throbbing vein, AMD swallowed up Untether earlier this year (acqui-hired for less than $100M), for what one expects are similar reasons.

Those dataflow moves should help them effectively tame the exponential costs of linear progress in this field imho, leaving Von Neumann in the dust for some workloads (or subsets of workloads -- as discussed in TFA).

Plenty of tasty morsels left in the sea though, not to mention related compilers and orchestration -- but too late for those salivating over Confluent it seems ...

Silicon photonics won’t matter ‘anytime soon’ says Broadcom CEO

HuBo Silver badge
Pint

Re: This thing we're behind the competition in

Exactly my thoughts! I mean, Google TPUs (developed with Broadcom and MediaTek) are all about optical interconnects, and are attracting Meta's interest (among others?). Nvidia's moving swiftly into photonics and Broadcom's retorting with photonic Tomahawks ... and, in other News, "According to Broadcom, its CPO tech is more than 3.5x more efficient than pluggables" ...

So, has Hock Tan lost touch with his company's efforts in this, or were these previous headlines meant to be humoristic; what gives?

Bezos-backed Unconventional AI aims to make datacenter power problems go away

HuBo Silver badge
Pint

Quite nice

Dylan's 2022 TFA showed a cool plot of the enhanced power efficiency of neuromorphics compared to CPUs and GPUs. I'm glad Unconvential AI is getting funding to pursue this sort of event-based analog spike-processing, or whatever it is they're not telling.

There's the ARM-based SpiNNaker out there, shiitake memristors, sub-threshold transistors, the Semantic Pointer Architecture (SPA) for building a brain, whatever this is, and what have you ... plenty of concepts and room to explore. I love it!

Brains neither run at GHz speeds nor consume GW of power so if GPUs "simulating" them do, they're likely doing it wrong ...

Amazon keeps the pressure on Intel, AMD with 192-core Graviton5 CPU

HuBo Silver badge
Windows

Very nice

Great to see Graviton5 supporting PCIe 6 out of the box. Hopefully that means we soon get to see how CXL 3.0 performs in real life with its multi-level switching, coherency, peer-to-peer DMA, and memory sharing -- that should ease the task of getting multiple sockets (or computational nodes) acting rather seamlessly as a large NUMA machine, when needed.

These should then be a great target for Virtual Fugaku if they also sport solid vector units.

India has satisfied its supercomputing needs, but not its ambitions

HuBo Silver badge
Windows

Neat

We've known since Fugaku that CPU-only HPC systems can do very well at 400+ PF/s on HPL, and leading perf on HPCG (until this month's Top500), so the AUM Neoverse V2 should be plenty fine imho, even without a GPU (just needs hefty vector units). It's not as energy efficient as a hybrid CPU+GPU system but way less demanding than the GigaWatt+ systems being considered for AI (so called) training and inference these days.

Hopefully the EU can also field itself some CPU-only 100+ PF/s systems running on Rhea 1 or 2 in the near future ...

Copackaged optics have officially found their killer app - of course it's AI

HuBo Silver badge
Windows

Illuminating

Great to see this focus on CPO and related advances imho. AWS saw the light last year and showed a great before-and-after picture of how it vastly cleaned up its rack designs (the beforelast photo). Google's also known for the reconfigurable optical interconnects used around its TPUs ...

One can send multiple signals (at different wavelengths) simultaneously, for example 128+ of them using a single ChromX laser, and do so in both directions (BiDi) at once, on a single fiber, without a need to wrap the cable in grounded foil thanks to inherent galvanic isolation. So, the throughput can be formidable, even with very few physical links.

It's super that Nvidia, Broadcom, Ayar, and even AMD (eg. Enosemi purchase) are getting to the point of deploying this tech in the field, finally! Quite notable also that Lightmatter's Passage is a reconfigurable optical interposer (iirc) which should help orchestrate throughput in response to workload, if needed. It's been a long wait ...

Nvidia's green500 dominance continues as France's Kairos super takes efficiency title

HuBo Silver badge
Holmes

Re: Interesting

... and, when the good folks at AMD, HPE Cray, and DOE/NNSA/LLNL get El Capitan to Rmax at 2.1 EF/s (i.e. 74% of it Rpeak), while maintaining its 60.9 GF/W (improving networking and/or orchestration, running at 34 MW) then it will fall on the same pareto line as Jupiter:

Leading GF/W = 85.6 - 7.44 * log10(Rmax)
The Octave/MATLAB code to add that to the above is (with a red star for the up-tuned Capitan):
hold on; semilogx(2089,60.9,'r*',logspace(0,3.5,21),85.6-7.44*linspace(0,3.5,21),'b:'); hold off
(see also the deep dive and comments on this at TNP)

HuBo Silver badge
Holmes

Interesting

With a machine's Rmax having quite an impact on GF/W (as noted in TFA), I'd submit that energy performance should be better than the following to qualify as remarkable (at present): Expected GF/W = 78.3 - 7.44 * log10(Rmax)

The two supercomputers that are clearly above that curve are Jupiter and El Capitan, but I'd expect upcoming Alice Recoque, Discovery, and other ExaFloppers to also be there (hopefully). If you have GNU Octave or MATLAB, you can check the situation with this script for Green500 systems beween 55 and 73 GF/W:

nvdrmx=[3.05 9.86 6.75 7.42 4.66 24.1 3.82 19.1 2.88 1000 3.12 123. 435. 5.04 13.2 98.5 3.14 10.2 6.42];

nvdeff=[73.3 70.9 69.4 68.8 68.2 68.0 67.2 66.9 65.4 63.3 63.0 62.0 61.0 61.0 60.3 59.3 58.2 57.0 55.6];

amdrmx=[2.99 24.5 31.7 31.1 24.4 19.2 208. 68.0 1809 15.9 9.89 46.1 27.2 478. 1353];

amdeff=[66.5 66.3 64.6 64.0 62.8 62.7 61.4 61.3 60.9 59.2 59.0 58.0 57.0 56.5 55.0];

semilogx(nvdrmx,nvdeff,'go',amdrmx,amdeff,'ro',logspace(0,3.3,21),78.3-7.44*log10(logspace(0,3.3,21)),'k:');xlabel('Rmax (PF/s)');ylabel('GF/W'); legend('Nvidia','AMD','expected')

SC25 gets heavy with mega power and cooling solutions

HuBo Silver badge
Windows

Yay, nay, or ouch?

It's interesting but unfortunate (imho) that the current AI hype-bubble is leading to this focus on high-density electricity production like this, especially through thermal techs. I can't help but think that what is truly needed are better ways to extract controlled electron flows directly from the otherwise disordered inner-state of matter, possibly through new metamaterials that act as one-way valves, or diodes, at the level of elemental pseudo-particles/waves (quantum or not). Maybe harvesting nuclear radiation through photovoltaics could work there to some extent (or somesuch)?

The days of LLM tech (with its wasteful energy consumption) as lead AI prospect are probably counted as well. For example, it seems from Thomas Hubert and team (a name almost as nice as "Bert Hubert") that Gemini's AlphaProof's relative success at Math Olympiads is mostly thanks to its use of classical AI, namely using L∃∀N (for formal mathematical reasoning) and tree search (some version of the A* algorithm?) to do its do, coupled with whatever Test-Time Reinforcement Learning (TTRL) is. It may match Gary Marcus's "neurosymbolic techniques" perspectives as well as that of DARPA's Shafto, and clearly doesn't work at all without the classical AI part, period.

Accordingly, datacenter that "will exceed 400,000" GPUs sound like a huge waste of resources if their focus will be on running LLMs. If they consume 150x what El Capitan does and yet don't produce 150x the computational oomph (at proper FP64 for HPC, and INT64 for classical AI) then they are a huge waste, full stop. A proper 150x El Capitan would crank 300 FP64 ExaFLOPs, which with MxP may result in 3.0 ZettaFLOPs of performance, and finally allow for high-resolution climate simulations at Earth-scale (among others). Granted the ICON team received the Gordon Bell Prize for climate modelling yesterday for its "Computing the Full Earth System at 1 km Resolution" on JEDI, Alps, and Jupiter, but using other physically-based models, or enhancing that resolution further (eg. to predict traveling wave derechos and traveling swirl tornados), still mandates Zettascale computing (iiuc).

Oh, and (almost unrelated) the other Gordon Bell Prize this year is for the Tsunami prediction research covered here back in August by Tobias ... cool stuff (imho)!

Eviden set to build France's first exascale supercomputer with AMD at the wheel

HuBo Silver badge
Pint

Way to go

Good to see Eviden progressing on its 2ⁿᵈ ExaFLOP system. The 12 MW looks ambitious at first but seeing how it is 2/5ᵗʰ of El Capitan's power draw it should lead to at least 2/5ᵗʰ its Rpeak, so 1.12+ EF/s. Some 15% generational perf improvement coupled with 78% efficiency (from the faster BXI?) could then reasonably give it an Rmax of 1.0 EF/s (or 20% on perf and 74% eff, ...).

What would be truly awesome though is if we could get 1 EF/s in 1 MW (or 1 PF/s in 1 kW) in the near-ish future (in full-fledged FP64)!

Scientific computing is about to get a massive injection of AI

HuBo Silver badge
Holmes

Blackwell: "high-performance linpack declined [but] (HPCG) benchmark – rose"

Good point. I guess that means we can approximate CHIE-4's HPCG/HPL ratio (Xeon+Blackwell) from ABCI 3.0's (Xeon+H200) with those figures (and/or vice-versa) ... say: (2.446/145)*(45/34)/(45/67) = 3.3% to CHIE-4's actual 2.8% ... perty close (rounding to 3%)!

Europe joins US as exascale superpower after Jupiter clinches Top500 run

HuBo Silver badge
Windows

Nice

Good to see Jupiter hitting 1.00 EF/s in 16 MW. That makes it #14 in Green500 for power efficiency, which is the first position in Exascalers (eg. El Capitan is #23). That's the right direction to go towards.

CHIE-4 is also interesting as the highest performing Xeon+B200 system so far. Its 24x MxP speedup is remarkable and hopefully won't be revised down as Tsubame4 was -- from 25x in Nov. '24 to a still respectable 16x in June '25. It's also neat that its HPCG perf is around 3% of its oomph on HPL, which is close to the ratio on CPU-only systems (Fugaku, Crossroads), better than other CPU+GPU machines (typ. near 1%), but still below vector engines (AOBA-S, TSUBASA, near 5%). If that's due to Blackwell (GPUDirect RDMA enhancements?) then it's great design, and I can't wait to see what the MI430X riposte will do!

Baidu answers China's call for home-grown silicon with custom AI accelerators

HuBo Silver badge
Windows

All tease and no strip

Hmmm ... I'd say the press releases on Baidu's Kunlun M100+ baked inference chips are rather thin-&-light on details (internal arch, networking, TDP, perf, litho, chip photo, die shot, supporting software stack ...). And if the 2 trillion (2-T) parms ERNIE 5.0 is out now, it must have been trained on some other hardware, maybe some Ascend 910 (as found in CloudMatrix 384) similar to how Huawei's 1-T PanGu-Σ was trained ... or some pre-sanctions Nvidia kit? More qestions than answers here ...

As for questions, they also "announced" (TFA link) Famou: "the World's First Commercially Available Self-evolving Agent", "able to quickly abstract complex problems and iterate automatically as conditions change" ... what!!?? Details and examples would be welcome on their part on this given the boldness of the claims.

Speaking of 1-T parameters though, Argonne's Ian Foster (and team) seems to think "an AI-native Scientific Discovery Platform (SDP) that connects models to tools, data, HPC, and robotics" based on "science-tuned foundation models" of that size could be worthwhile. I have my doubts. But he'll present that concept next Friday at SC25 (and there's an ArXiv on this from last year) ... might be worth a gander (or not?).

HPE details Vera Rubin blades for next-gen Cray supercomputers

HuBo Silver badge
Windows

Yummy

Looking forward to seeing more details on MI430X performance, with the hope of a cool 1.0+ FP64 PetaFlop/s in 1 kW for example (like a Roadrunner on a chip, but 1000x more power efficient).

Wonder what kind of new news there'll be on this in St.Louis at SC25 (Sunday to Friday)?

Microsoft teases agents that become ‘independent users within the workforce’

HuBo Silver badge
Pint

Re: Automate HR....

Cool analogy! Had to look it up and found it in the French WikiPedia (bio-robots), and the German WikiPedia (Bioroboter), but not in the English WikiPedia for some reason ... but it's at the NIH's NLM as biorobots, and there's a nice picture from Swiss (Cheese) media here.

Reminds me of Ludwig Von 88's song Cs 137 that goes "Nous les gars de Pripiat ..." (We, the guys from Pripyat ...). The efficiency and progress (ours once more) of this Agentic AI (so-called) virtual workfarce may well just lead to similar devastation ... with big beautiful workplace buildings empty of life until it's time for post-disaster decontamination ... by us, the disposable bio-robots ... imho!

MIT Sloan quietly shelves AI ransomware study after researcher calls BS

HuBo Silver badge
Pint

Refreshing

I love Beaumont and Hutchins' takes on this, which I'd summarize as: it's absurd jaw droppingly bad corporate marketing bozos cyberslop nonsense ... just rolls off the tongue!

Great to see outlandish AI claims being taken down a notch this way.

There's mushroom for improvement in fungal computing

HuBo Silver badge
Thumb Up

Love it

Seeing how shiitake essentially consist of a whole bunch of memristors that are fundamental to low-power neuromorphic computing, makes me wonder if thin-slicing them and inserting the result in a sandwich of BCI electrodes (grid arrays as found in a toaster, kind off) could result in interesting learning or stimulus processing abilities? Or would it be necessary to grow the shiitakes in the shape of a cauliflower first, possibly through transgenic hybridization (by gene gun? Like oyster mushrooms?)? Would the shroom's 36,000 sexes get in the way? Could these then be used for brain transplants?

The possibilities seem endless ... ;) (not to mention the delicious recipes!)

This is Doom, running headless, on Ubuntu Arm… on a satellite

HuBo Silver badge
Pint

Re: I'm still confused, 14 years later...

Good point! I guess if 1992 was year 1 of id Doom development, 1993 was year 2, and November 2011 became the start of year 20 ...

But hey, cool TFA and video (well worth the 8 minutes)! And as Ólafur notes, it should be interesting to see what folks end up doing with OPS-SAT VOLT (expected to launch in 2026) that'll feature the Leopard DPU with its Zynq UltraScale FPGA sporting quad Cortex-A53 CPUs (as compared to the Cyclone V SoC's dual-core Cortex-A9 in this past OPS-SAT).

Should be fun to watch ...

Berkeley boffins build better load balancing algo with AI

HuBo Silver badge
Windows

A bit surprised

In their paper, the UC Berkeley crew writes:

"OpenEvolve independently rediscovered and fully exploited a tensorized zigzag partitioning scheme, yielding an evolved EPLB algorithm that achieves a 5.0× speedup"

suggesting to me that the speedup technique was already known, but not necessarily applied to this specific problem(?). It also seems notable that in their Table 1, a lot of OpenEvolve applications resulted in 7% speedups (only), with "Adaptive Weight Compression" being 14% worse instead than without AI "optimization" ... maybe they'll want to tone down the "amazing" "never seen before" "5x" speedup bit a little ...

Despite this though, I wouldn't be against seeing ADRS applied to improving on reverse Cuthill-McKee algorithms, multi-frontal methods, Delaunay tetrahedralization with matching faces, ADI over simplices (NOT rectangles), and the likes, especially in the context of large multi-CPU systems where the type of EPLB they investigated is important. Could be interesting for reconfigurable Maverick-2-type systems too!

Bottom line, if this tech can help tame the CPU memory problem and make the Pain of Parallel Programming more bearable, then I'm all for it. Hopefully they get tested over properly serious challenges though, rather than AI nombrilism ... imho.

Don't take AI to Thanksgiving: Bots have hidden biases

HuBo Silver badge
Pint

Re: Sussman attains enlightenment

¡Moscas gratas muynacho! Your kōan-foo carves me bar just right ... Zen up++! ;)

NextSilicon Maverick-2 promises to blow away the HPC market Nvidia left behind

HuBo Silver badge
Windows

Much needed

Yeah, the good folks at Sandia Vanguard describe the Mavericks as a runtime-reconfigurable accelerator which vastly helps it adapt its dataflow to workload specifics ... very neat! We badly need this capability also in scale-up/out networking to propagate the benefits of this flexibility to the system scale (with PCIe 6, CXL 3, and CPO).

The 600 GF/s FP64 perf on HPCG may sound low compared to 45 TF/s dense (eg. HPL) on a GB200, but checking with Top500 shows that the HPCG perf of Frontier (MI250X), Aurora (GPU Max), and Alps (GH200) is less than 1% of their perf on dense HPL (aka Rmax). In other words, it would take a 60+ TF/s FP64 GPU to get the 600 GF/s on HPCG that Maverick-2 (750W dual-die) gets. Interestingly, TNP reports (linked under "pointed out") that this Maverick cranks 40 TF/s on dense calcs, making its HPCG oomph 1.5% of its dense grunt, which is 1.5x to 3x better than seen in current Top500 GPUs ... nice!

The 2023 Gordon Bell Prize for Climate Modelling rewarded the SCREAM team for their pioneering exascale 1.26 simulated years per day of cloud-resolving earth atmosphere simulation at 3.25 km resolution. Getting the resolution down to 1 km should require (3.25 x 3.25 x 3.25)² more computations (approx. 1000x, i.e. Zettaflopping) and any tech that helps us get there efficiently is welcome imho (eg. Maverick-2). Meanwhile, some folks claim they can compute the Full Earth System at 1km, with 91.8 simulated days per day (1/4-year per day), on Alps and Jupiter, which should be interesting to see at SC25, if it works (software and hardware combined to improve perf further than either alone?)!

AI boffins teach office supplies to predict your next move

HuBo Silver badge
Holmes

Re: Teach the objects to sing...

Yeah ... reminds me a bit more of skutters though, as you can see with the moving trivets, shaking keycup, and usb-plug zamboni in the lab's 30-second youtube "Towards Unobtrusive Physical AI" and their 4:46 video to the right of that which has the stapler-moving unit near 4:20 (mostly harmless except for the knife shuffling at 3:00 and 3:20 -- suspense!).

It's cool as experimentation imho (research), especially the 30-sec youtube of "Constraint-Driven Robotic Surfaces" (on the same lab page) that has a shape-shifting wall that automatically re-multi-purposes itself in response to perceived intent by whomever is there ... (I'd say it could at least be part of a fun futuristic movie!). Interesting angles imho ...

The $100B memory war: Inside the battle for AI's future

HuBo Silver badge
Windows

To HBM or not to LPDDR5x, and/or MR DIMM?

HBM might be a bit of a stopgap measure in this tensor-matrix-vector compute tech that underlies today's LLMs and related AI (so-called). Long-term, it should be best to consider in-memory compute and dataflow archs that are best suited to this kind of workload that involves very little case analysis and branching (dispatch) but consist instead of a buttload of multiplications and additions performed on hefty datasets held in memory. Distributing a whole bunch of weak-and-simple compute units throughout memory makes the most sense for this imho.

As an almost unrelated anecdote, I was comparing perf of an ILR (graph) interpreter and a JIT on some microcontrollers, with the JIT being generally 3x faster than the interpreter ... except on a board with a 600 MHz 32-bit MCU (Cortex-M7) hooked to 16-bit wide 150 MHz external RAM. For this board with 8x speed ratio (for 32-bit data), JIT code performed no faster than ILR interpretation. Analyzing the situation showed that the JIT process specialized the code such that case dispatches were removed (no longer needed) leaving mostly memory accesses and unconditional branches to be executed (with adds and mults in-between). The ratio of memory access to compute thereby increased relative to ILR interpretation, highlighting the severely limited speed of attached RAM (maybe that's what's going on with Python's JIT too?). In this case, reducing MCU clock to 150 MHz made the JIT 4x faster than ILR (at this same lower clock), but one would have obviously preferred for the RAM to be 4x faster instead and properly take advantage of the MCU's 600 MHz capability (premium).

Anyways, as stressed in TFA RAM speed is important (get the fastest you can), but also arch. For AI, getting compute units and mem close to one another, and distributed if possible, should help. Some other workloads may however benefit more from a graph processing beast arch ... among others, imho!

18 zettaFLOPS of new AI compute coming online from Oracle late next year

HuBo Silver badge
Windows

ZettaFLOPS shmzettaflops.

Wow! With SC25 (Nov 16–21) St. Louis, MO (not quite nicknamed "Chess capital lion of the gateway valley mound to the West World") right around the corner, I can't help thinking that if FP64 oomph scaled like FP4-to-TF32 in those GPUs (making it about 2 PF/s), an 800,000 GPU system would crank 1.6 ZettaFlops of HPC-appropriate horsepower (very much needed for very high-resolution whole-earth climate modeling)!

Instead of that, at only 80 TF/s per GPU in FP64, such system may put out just 64 Exaflops/s (EF/s) of compute (if they're all efficiently linked together) ... pfaaah! Even MxP might only raise this to a measly 640 EF/s of useful number crunching ... should I really have to get myself out of bed for this?! </he-he-he!>

DGX Spark, Nvidia’s tiniest supercomputer, tackles large models at solid speeds

HuBo Silver badge
Windows

Great review

Very thorough and complete. It'd be great if the usual suspects would send the ElReg Review Bureau a set of their Thor, M5 Max and Strix Halo units for a further comparative analysis imho!

It looks like the 128GB of RAM and 500 TF/s of dense oomph at FP4 (1 PF/s sparse) are the key features of this very cute desk machine, about the size of a Mac Mini of some generation. The Thor might be twice as fast on models of the same size (if MEM bandwidth is ok) but it is grey-ugly by comparison. And it seems there's no real purpose running LLMs of 3 billion parameters and smaller locally on Core i3 Whiskey Lakes with no usable GPU, so this Spark's ability to fine-tune 70B models, and run 120B models, could prove useful to some folks (eg. workers who's boss insist they use AI on the job, retirees endeavoring to stay in touch with tech, etc ...). But of course, they'll need to fix the Firefox out-of-memory issue first!

This unit reminds me that AMD should have put out a ½-, ⅓-, or ¼-scale MI300A, with added FP4 support, for exactly this space. Seems to me it could be solid competition to the GB10/Spark, and maybe help hook kids in early and often, cementing their favoring of the ROCm SOCm ecosystem and suchlikes! The competition should also help make such devices more affordable, and hopefully they'd sustain solid FP64 perf so that proper grad students involved in HPC might make good use of them too!

OpenAI GPT-5: great taste, less filling, now with 30% less bias

HuBo Silver badge
Gimp

Well, I think the outsized cognitive dissonance of the extreme right-wing is such that they label other extreme right-wingers that are of different genders, skin tone, or religion, as extreme left-wingers (cracks me right up!). But the jury's been out with a 100-page report and verdict on this for some time:

"jihadists too are extreme right-wing actors even if they are rarely referred to in such terms."

Then again, extreme right-wing mass murderers, like Marc Lépine (killed 14, in 1989, in Canada), Timothy McVeigh (killed 168, in 1995), Anders Breivik (killed 77, in 2011, in Norway), Dylann Roof (killed 9, in 2015), Alexandre Bissonnette (killed 6, in 2017, in Canada), Brenton Tarrant (killed 51, in 2019, in New Zealand), haven't really needed the help of Osama Bin Laden (killed 2,977, in 2001) to get noticed for their insane horrors.

The CSIS analysis of Left-Wing Terrorism and Political Violence in the United States over the past 10 years excludes those (by date or location) but still notes:

"Since 2020, only two fatalities have resulted from left-wing terrorist attacks in the United States: Luigi Mangione’s assassination of UnitedHealthcare CEO Brian Thompson in New York City in December 2024 and Michael Reinoehl’s fatal shooting of right-wing protester Aaron Danielson in Portland, Oregon, in August 2020 (if the Kirk killing is included, as seems likely, it would be a third fatality). Right-wing and jihadist attacks, by contrast, have caused far higher fatalities."

The data in their Table 1 drives the point home further imho, in the past 10 years, in the US, extreme right-wing non-jihad and extreme right-wing jihad have each individually been close to one order of magnitude more deadly than extreme left-wing actions.

Should anyone in their right mind board a plane from an airline that crashes one order of magnitude more often than others? Shouldn't an order of magnitude greater resources (including policing) be dedicated to addressing the order of magnitude greater threat to human life?

HuBo Silver badge
Gimp

Hmmmm ... let's first mention that the nutbag 100 million deaths overblows even the most wackily revisionist extreme right-wing completely made-up estimate (70 million). A better informed value for the death toll resulting from this historical human tragedy would be 3 million individuals (or even 0.4 to 7.7 million) -- still a terrible 10 years.

In terms relative to the population of the countries where the deaths took place though, 13 million deaths from an 80 million person country ruled by medieval inbred narcissists of the extreme right-wing persuasion is significantly more intense than 3 million deaths from a 750 million person country.

To illustrate the difference in proportion, if a World of 8 billion people was ruled by extreme right-wing nazi wankers, one would expect them to exterminate 1.3 billion people to quench their lust for the blood and domination of others, plus other unmentionable obsessions. And if that world was ruled by a misguided Mao Zedong instead, the death toll would be 0.03 billion (1/40ᵗʰ of the nazi's). I'd obviously prefer a world with no death toll at all but that's a separate discussion ... (here we just want to establish who's worse iiuc).

From these figures then, extreme right-wing nutjobs are 40x more of a threat to humanity than extreme left-wing folks ... (just my analysis though; not an historian or anything ... but that figure does feel like it accurately matches my lived experience of the real world).

HuBo Silver badge
Gimp

Hmmmm ... Ain't the extreme right-wing the folks who exterminated 13 million innocents, including 6 million Jews, 500K Romanis, 300K disabled, thousands of homosexuals and so forth?

I think we need more of the extreme anti-fascist, extreme anti-racist, and extreme left-wing kind of folks to compensate for the extreme cretinism of the extreme right-wingers ... (just my opinion though).

Inside the belly of the beast: A technical walk through Intel's 18A production facility at Fab52

HuBo Silver badge
Pint

Re: Interesting tour

Thanks for the correction! (had to look it up in my old notes and Gareth's July piece ... quite the soap opera!)

HuBo Silver badge
Windows

Interesting tour

I wonder how breathing that highly-filtered air compares to the bog standard air outside and elsewhere (I guess one generates fewer snots and boogers in Fab52, and may sneeze less from allergens?).

It'll be great to see Panther Lake (Intel 18A and 3, and TSMC, on Foveros) and Clearwater Forest (Intel 18A, 3, and 7, on EMIB) in action when they come out (full disclosure: what I really really want is Diamond Rapids though!). And everyone'll be happy to get answers on 18A yields imho (hopefully very good), after that slight delay to get it tuned right ... especially since this is done with brand spanking new tech-leading High NA EUV litho machines (0.55-NA) from ASML, that Intel was first to get (a nice lead)!

Hobble your AI agents to prevent them from hurting you too badly

HuBo Silver badge
Pirate

Re: Sick of that tech-bro manchild superhero schtick...

Eat the rich! (but NOT the OP, Rich 11!)

How chatbots are coaching vulnerable users into crisis

HuBo Silver badge

Cogito CoT, ergo sum?

Yeah, Nietzsche's revisit of Descartes' "I think therefore I am" (in: Beyond good and evil) just hammered that nail even deeper into our collective coffin (paraphrasing): "is it the I that does the think, or is it the think that does the I?" (iiuc).

I mean, these philodudes would have it that if we take the PoV that a software box can think (somehow) then it is a being (an I, possibly an agent with individuality, or multidividuality), or can come into being, or can create one ... and that's smoking some pretty potent far-out fully-baked stuff in my rolling paper book!

The PoV's clearly hallucinating an imaginary panorama where a bunch of randomized matrix-vector multiplications (aka stochastic linear algebra), suddenly generate such phenomena as intelligence and cognition (by so-called "emergence"), when scaled big enough to be inscrutable from the outside, essentially equating them with magic, prestidigitation, and related illusionisms.

Inasmuch as such magical thinking can readily turn pumpkins into golden carriages, it shouldn't be any surprise that it can also just as easily turn average humans into deliciously delirious fruitcakes, outright dummies, and dependents of decreased prosocial intention, imho ... (in the real world, unfortunately).

Invest in straitjackets (I think)!

Qualcomm solders Arduino to its edge AI ambitions, debuts Raspberry Pi rival

HuBo Silver badge
Windows

A bit worrisome I guess, like Intel-Altera, AMD-Xilinx, IBM-RedHat ...

But to the UNO Q, it has a quadcore Cortex-A53 (2.3 DMIPS/MHz) at 2 GHz, plus a Cortex-M33 at 160 MHz, which is not ground breaking given Raspberry Pi 4's (2019) quad Cortex-A72 (4.7 DMIPS/MHz) and especially 5's (2023) quad Cortex-A76 (10.7 DMIPS/MHz). The UNO Q is more like the 10-year-old Raspberry Pi 3/3+ (Cortex-A53) it seems to me ...

Also, it looks like HDMI has to go through USB-C (MIPI-DSI "DisplayPort Alt-Mode on USB-C" from their datasheet PDF, no dedicated connector), so a keyboard-mouse may have to share that plug (or bluetooth), and an onboard SD-card cage would have been nice ...

Makes me wonder if the upcoming Qualcomm Dragonwing™ IQ-9075 EVK will get Arduino branding as well (the ARM cores in there do 12 DMIPS/MHz iiuc -- much more competitive)?

The overall feeling I get is that Qualcomm(+Arduino) may be trying to position some boards/systems in the space between Raspberry Pi and Nvidia Jetson. This will surely require more oomph than this here initial UNO Q salvo though!

Smart-blooded super soldiers: Coming soon from DARPA

HuBo Silver badge
Pint

Re: Fat chance

Good points! It does make that DARPA voodoo hybrid chicken-blood drinking supernatural powers seeking endeavor much more realistic, and low cholesterol to boot ...

The opportunities for convergent cross-species evolution will likely be endless indeed! %~O

HuBo Silver badge
FAIL

Fat chance

AFAICT red blood cells are too small for this — they need to be so that they can move through capillaries, and bend. They don't have the volume to be fitted with a nucleus, endoplasmic reticulum, or mitochondria. Good luck trying to make them synthesize anything, other than by pure magic.

Arm bets on CPU-based AI with Lumex chips for smartphones

HuBo Silver badge
Pint

Re: To SLM or not to LLM, that is the prompt?

Nice! The emperor penguin egg looks a bit small but the others are quite acceptable imho ...

How much RAM in your phone, and was this 7B llama run CPU-only?

NASA finds best evidence of life on Mars so far

HuBo Silver badge
Alien

Where astronomy meets gastronomy

Well duh! I mean, beyond the bloated discomforts of gas giants, we've known for quite some time that space is full of life, epicureanly so ...

Just consider Juno's recent snaps of Io's hole-in-the-wall greasy spoon specialty space olive focaccia ... at least as mouth-watering as the earthen version in my book (beats even a French fougasse)!

And NO! We don't cook rocks to make such delicacies, and sure don't put leopards in them either (too gamey), but poppy seeds ... why not. And for the technically-leaning you may want to think about these yummy food's ornamentations as resulting from coupled bioreactive-diffusive processes that result in Turing patterns thanks to the unique skills of those craftaliens and microbaliens involved in producing their gooey-goodness (eg. see Fig. 5 b and c here).

And so it is with the most intricate of Martian gastronomy as presented in TFA. Not so much for the gregite that is reminiscent of our ancient scaly-foot gastropods (we are not French), and certainly not for Nicky Fox's comments that this is all just "poo" (we are not British either), but for what clearly comes out of Fig. 5 top-left in "the paper published" (TFA's link to Nature).

Don't let the reaction rims, fronts, and nodules blind you to what is depicted here ... look at the big picture, beyond the 10 x 3 mm periwinkle rectangle ... what do you see? It's a masterpiece of Martian red pistachio bread, that's what!

And if that ain't life my earthly friends, then I don't know what is ... </martian_gastronomy_humour>

Page: