* Posts by Tom Womack

178 publicly visible posts • joined 7 Jun 2008

Page:

Dell discloses monster 20-petaFLOPS desktop built on Nvidia's GB300 Superchip

Tom Womack

Re: FP4??!!

Normally you use the FP4 to compress the weights by which you're multiplying an FP16 vector and accumulating into an FP32.

I'm slightly surprised that they use a fixed FP4 format - using the four bits as index into a float16[16] would be a lot more flexible, but maybe the cost in silicon for that many multiplexers is actually perceptible whilst it's just wires and a tiny lookup table to convert FP4 to FP16.

Rocket Lab says NASA lacks leadership on Mars Sample Return

Tom Womack

Northrop Grumman are proposing two solid stages

https://www.jpl.nasa.gov/news/nasa-mars-ascent-vehicle-continues-progress-toward-mars-sample-return/

https://ntrs.nasa.gov/api/citations/20200002328/downloads/20200002328.pdf

NG have got to the point of doing Earth tests of the solid rockets they plan to use - three metres tall, 0.5-metre diameter, two solid stages.

"Ultimately, SRM1 featured 216kg of propellant, while SRM2 featured 54.4kg of propellant. Although the actual solid motor designs cannot be shown due to International Traffic in Arms Regulations (ITAR) ..."

Talk of Broadcom and TSMC grabbing pieces of Intel lights fire under investors

Tom Womack

Intel is manufacturing the processors they care about - the $18,000 128-core Granite Ridge sixth-generation Xeons - in Intel Foundry.

The low profit margin desktop processors which TSMC can make, they're letting TSMC make, because they have capacity issues at the foundry and it's not worth making $400 chips on machinery that could be making $18,000 chips.

The channel stands corrected: Hardware is a refresh cycle business now

Tom Womack

Re: Money dear boy

Desktop hardware is amazingly cheap - you can do most things with an RPi400, the problem is that IT doesn't like handing out things that look like toys and people don't like feeling that IT has issued them with a toy. If you insist on giving employees portable computers then the bottom of the line MacBook is also enough for most things.

In several areas where costs have stayed high performance has soared with it - a 4080SUPER card is an absolute compute monster.

The problem is bigger servers, where Intel and AMD's attempt to appeal to the market for huge boxes for data centres means they have no incentive to keep decent 1U servers at the £1000 mark; you're paying more than that for a 1U box with a motherboard in, before sticking in the processors whose pricing starts in the high hundreds and goes to the stratosphere.

Screwed by the cloud: Hardware vendors looking for that raison d'refresh

Tom Womack

Re: Simply add RAM to old servers?

Second-hand DDR4 RAM is incredibly cheap nowadays - I have 288GB in a home-lab machine, ECC DDR4 is down to £1.50/GB second-hand even in 64GB sticks.

Ah, but you're in the cloud so you don't get to make that tradeoff, and the fact that the most recent CPU generation is really quite expensive per CPU is a matter for Amazon's fearsome purchasing department rather than for you.

Tom Womack

If you have any machines left with twelve 15krpm SAS drives in, you probably are saving money and electricity immediately if you replace them with a couple of SAS eight-terabyte SSDs. But since they stopped making 15krpm SAS drives in 2016, if you're in that situation you almost by definition don't care about your storage estate.

Tom Womack

Saving 19% of electricity is a spectacularly trivial bit of opex to try to use to justify an enormous capex - saying that you have to rent an eighth as much expensive DC space would seem more plausible.

A big chunky new dual-Granite-Rapids server might use as much as 2kW; 19% of that is 400W, which is less than £2000 a year even if your industrial electricity contract came from Dewey, Cheatem and Howe; but the server costs at least ten times that and HPE are explicitly saying that you can't expect it to last the decade it would take for the lowered opex to repay the capex.

Coder wrote a bug so bad security guards wanted a word when he arrived at work

Tom Womack

Yes, if you ask in September everyone will be in credit. People generally prefer consistent £250/month bills to £50 bills in July and £750 bills in December, so they'll be in credit at the start of winter in much the same way that squirrels will have an enormously positive acorn balance at the start of winter.

Amazon accused of cheating low-income Prime users out of two-day deliveries

Tom Womack

Re: Wait

Amazon Prime is half-price for people on a number of government assistance schemes (https://www.amazon.com/58f8026f-0658-47d0-9752-f6fa2c69b2e2/qualify), non-Prime delivery is quite expensive for small items, and dodgy parts of DC are probably not replete with convenient neighbourhood stores so Amazon may well be the best way to get anything.

Indonesia bans iPhone 16 over Apple’s undelivered investment promises

Tom Womack

What's your source for that Singapore HPC procurement? EPYC 7713 is a 3.5-year-old processor with Zen 3 cores and I'd be really quite surprised if a hundred million dollar procurement worth doing a press release about was using chips that old.

AMD pumps Epyc core count to 192, clocks up to 5 GHz with Turin debut

Tom Womack

Re: Power consumption

Apparently very little - the Phoronix review https://www.phoronix.com/review/amd-epyc-9965-ampereone/5 says that the Turin chip idled at as low as 19 watts (whilst measuring it at 461 watts flat-out)

That's getting to the point where RAM power consumption and the rest of the platform are more significant than the processors, which is really all a processor designer can hope for.

And it indicates AMD are paying attention to this - Ampere's chip which was also examined in the Phoronix review never used as little as 100 watts

https://www.phoronix.com/review/intel-xeon-6980p-power/7 shows (dual-socket) Granite Rapids power consumption between 63 and 1085 watts

Tom Womack

I've had reasonable luck getting low-spec systems from bargainhardware.co.uk then swapping the CPUs out for bigger ex-cloud ones

(though I have an intermittent memory fault on a 104-thread / 288GB Skylake box which is going to be a bit of a bugger to diagnose, dual-Skylake means the memory modules come in sets of twelve and I'd rather have dinner at the Fat Duck twice than buy 12 new 32G modules at £42 each)

A closer look at Intel and AMD's different approaches to gluing together CPUs

Tom Womack

If you're remotely contemplating buying a 16-core package with 512MB L3 cache, you have Oracle, SAP or weird-engineering-CAD licence fees that make the difference between the $4256 9175F (sixteen dice, one core per die) and the $12984 9755 (sixteen dice, eight cores per die) immaterial.

Looking at the price list on Wikipedia what surprises me is the enormous cost from AMD for lots-of-small-cores: you're saving less than $800 in $11000 by going for 5c rather than 5 cores if you want 96 of them, whilst 96 cores of Sierra Forest are $5265

It probably says something about yield on the Intel 3 process that the seven-working-16-core-blocks-on-the-chip Sierra Forest-SP are $6000, eight-working $8400 and all-nine-working $10200; this suggests the eighteen-working Sierra Forest-APones will be prodigiously expensive.

Tom Womack
Boffin

I was rather hoping for a discussion of how AMD and Intel's approaches to interconnect differ - I haven't seen the high-resolution X-rays or SEM images of cut-through-the-middle devices that would make everything clear, but the impression is that AMD is attaching the chiplets to what's basically a multi-layer circuit-board with what are basically BGA balls, and Intel is attaching them to other silicon chips embedded in the board (for which Intel's term is EMIB) with what are more like through-silicon vias.

(I am not completely sure that there isn't a load of L3 cache on those embedded chips in Intel's newer products, I don't quite see how the cache-size arithmetic works out otherwise)

Lebanon now hit with deadly walkie-talkie blasts as Israel declares ‘new phase’ of war

Tom Womack

Re: If I were a world leader or in the administration thereof. . .

This strikes me as an actively evil version of the Encrochat project ... make your adversaries think they need special equipment rather than iMessage, WhatsApp and Signal, and then provide the special equipment.

(though given where Pegasus comes from I would understand Hezbollah being decidedly wary about using standard secure messengers on standard Android and Apple devices because their adversary is known to have zero-day attacks on both OSes)

Tom Womack

Re: If I were a world leader or in the administration thereof. . .

Hezbollah are not idiots and have the technological capability of a mid-sized nation-state's intelligence services working for them; I expect that by yesterday afternoon some of their techies (or some techies in the Iranian embassy in Beirut; difficult to draw a solid line there) had disassembled a Hezbollah issue walkie-talkie and found the modifications, and so Israel blew them up today because Hezbollah would have told everyone to get rid of them tomorrow.

Gelsinger opens up about Intel troubles amid talk of possible split

Tom Womack

Re: Nailed it - almost

Arm was no slouch in the pursuit of codenames - about the most popular page on the internal Wiki was an informally maintained one mapping old codenames to externally-visible product names and explaining that a new codename was 'direct successor to A57' or 'R7 with more reliability features for automotive'.

It makes some degree of sense to pick a codename that looks nothing like a marketing name, use it throughout the codebase, and let marketing decide that what comes after 57 is 72 and what comes after 55 is 510 and then maintain the lookup table; getting into a position where you have to change code in a myriad places and break everyone's unmerged work because of urgent diktats from marketing is to be avoided.

AMD predicts future AI PCs will run 30B parameter models at 100 tokens per second

Tom Womack

18GB of HBM3e (that is, a single 12-high stack of 12Gbit devices) isn't a completely unreasonable thing to put next to the GPU package in a 2027 SoP, and handles AMD's spec already. It's a 'chiplet' 11mm on a side, the same size as the IO die in Ryzen.

(https://www.theregister.com/2023/08/23/sk_hynix_hbm3e_sample_shipped)

Honey, I shrunk the LLM! A beginner's guide to quantization – and testing it

Tom Womack

What is the '4-bit quantisation' actually doing here? Is it switching all the weights to be in the range -8 to 7, or is it (as you are in the paletted-image parrot example) picking sixteen representatives in a clever way and mapping each entry in the matrix to the nearest representative?

The -1/0/1 paper clearly was just using those three weights, but I think it was doing something exotic.

Xen Project in a pickle as colo provider housing test platform closes

Tom Womack

Re: "the Project is not sure its hardware would survive a move"

At least part of OSSTest is a whole lot of devboards, because one of the things that you want to test fairly early on on newly-brought-up hardware is whether the virtualisation works.

Contracting with someone else's movers to move irreplaceable bare PCBs attached to a bit of wood with rack rails screwed to the side is something you'd be very unconfident would work perfectly.

(similarly, OSSTest is entirely a bare-metal thing because it's testing hypervisors, which want to run on bare metal and for which running under qemu on different hardware is not an adequate test)

The question of why all this stuff isn't literally on-premises rather than in a datacenter is an interesting one; presumably that something of the not-very-big and not-very-corporate scale of Xenproject has trouble owning a big enough bit of property.

Astroboffins order most advanced spectrograph ever to sniff out alien life

Tom Womack

And oxygen in expolanet atmospheres really is something much better sensed from space telescopes, because making sure that you've subtracted out exactly the contribution from oxygen in our atmosphere and no more or less is implausibly statistically challenging.

I think ANDES is the instrument that, over a decade or so, should be accurate and stable enough to be able to do the awesome experiment 'the red-shifts of these bright quasars have increased because they're moving away' - explicitly seeing the expansion of the universe on the time-scale of a single research career is pretty cool.

Space insurers make record-breaking loss as orbit gets cramped

Tom Womack

Re: An incentive perhaps

Very impractical from geostationary orbit - I did do a conceptual design for a GEO-to-disposal tug for my MSc thesis, but it would have been the biggest set of ion engines ever operated, with about half the solar power of the Space Station and very optimistic assumptions about how much iROSA arrays could be made to weigh, and it required an entire Falcon-9 load of replacement argon, hydrazine and N2O4 per satellite deorbited. The problem is that the disposal burn has to be impulsive and so needs to be chemical, and then you have to do a similar burn the same size so you don't dispose of the tug too. The design was really fragilely dependent on the exact tankage fractions for the argon.

Tom Womack

Re: What type of claims are being paid out?

No, insurers are generally not stupid and will not insure prototypes at a price that any prototype provider is wiling to pay.

There were two huge claims in 2023: Viasat 3, where the operator was willing to pay for a fully-expended Falcon Heavy to get it up to GEO as fast as possible and then found that the giant unfolding antenna didn't unfold, and Inmarsat 6 F-2 whose power supply failed to provide power.

Also, one provider of the extremely fiddly Power Processing Units required for operating large ion drives has a reliability issue which means that four big GEO satellites built by Northrup Grumman are running on one PPU and will die if that one does.

Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine

Tom Womack

And the reason you run AI workloads on super-expensive GPUs is precisely that the GPUs have large quantities of extremely fast RAM.

If your RAM Is running at 70Gbytes per second, which is a pretty good measured performance from fast DDR5 on current desktop platforms, then even in int4 you're not going to get more than twenty tokens a second out of a 7B model; or two a second out of a 70B model which uses more than 32GB of platform memory.

(I don't have a very good idea why the model sizes are 7B, 13B, and 70B, rather than being just below the memory capacities of common GPUs - I'd have guessed that 7B was so you fit the model and a bit of extra data in a 16GB GPU, but the next bigger GPU is 24GB and the one after that 40GB, so I was expecting 11 and 18)

Whistleblower raises alarm over UK Nursing and Midwifery Council's DB

Tom Womack

Re: You just cant get more bread and butter

A dozen clerks is half a million pounds a year; an 800,000-row database fits easily on a Raspberry Pi with an SSD hanging off the USB port.

Creating a single AI-generated image needs as much power as charging your smartphone

Tom Womack

Running DiffusionBee on my M1 Mac mini while looking at PowerMetric in another window, I'm getting about 9.5 watts of CPU+GPU power usage for about thirty seconds per image generated, so 0.00008 kilowatt-hours.

Developing AI models or giant GPU clusters? Uncle Sam would like a word

Tom Womack

Re: GPT-4 is a thing

But the GPT-3 paper described the number of parameters and the compute intensity of the training, whilst the GPT-4 paper decided to be deliberately uselessly vague about that to free up pages to fill with useless analysis of 'AI risk' and of how they had crippled the model so that it didn't regurgitate bomb- or drug-making instructions which could be found in moments with an obvious Google search.

UK throws millions at scheme to heat homes with waste energy from datacenters

Tom Womack

Re: Intel

If the technology is more efficient, you can stick more of it in a box to use the same amount of power at the same temperature. There was a brief period where people used individual low-power servers, before realising that collecting jobs using virtual machines onto big machines hosting 256 vCPU in 2U and 1kW was a much more efficient use of the very-expensive space.

Tom Womack

Re: Assumptions

This has been a serious problem in Eastern Europe where district heating was provided by coal-fired power stations or by steelworks uneconomic in a global context. On the whole if people stop wanting to host computers in London we have some more serious problems, and replacing servers with electric resistance heaters is an ugly but effective solution. A one-kilowatt resistor costs about £50 compared to a £50,000 Sapphire-Rapids-plus-H100 server.

SAP user group calls for support deadline reprieve amid hospital billing worries

Tom Womack

Re: Two years to tender

Not really - what's bankrupted Birmingham is spending years underpaying their female employees and then having to find £700 million upfront to pay the second part of the settlement.

£100 million on an Oracle migration that was budgeted at £19 million and didn't work is a drop in the bucket compared to that.

Microsoft billing 3 cents a minute to revisit tedious Teams meetings via API

Tom Womack

This doesn't seem a plausible source of truly eye-watering sums; $1.80 per hour is $1400 even if a meeting contrives to be recorded 24/7 for a month.

Maybe there are people using Teams as a back-end for their surveillance cameras, and this should quickly make them stop.

SmartNICs haven't soared so VMware will allow retrofits in old servers

Tom Womack

Is CXL pricing really as keen as claimed here?

I needed a lot of memory in a server last year, so bought 12x16GB DDR4 sticks from bargainhardware for £42 each. The going rate is now £42 per stick for 32GB sticks, if I wanted 1536GB it would be a thousand pounds even if I threw away all the current DDR4.

It seems unlikely that a PCIe card with a controller significantly more complicated than the normal kind and 256GB of brand new memory chips can compete with that on pricing.

Is this a weird artificial market brought into existence by Dell's insane pricing on DDR5?

Academics have 'no confidence' in Edinburgh University's response to its Oracle disaster

Tom Womack

It's a pity that universities do not seem temperamentally suited to 'we had a vast and wide-ranging disaster with our Oracle implementation; we have managed to make it work reasonably well now; we will spin-off the better people in our internal IT group as a consultancy to help other universities have smaller and more confined disasters'.

Yes, UK universities are each a weird thing unto themselves, but they resemble one another rather more than any of them resembles the strongly hierarchically structured Wisconsin widget-works which SAP or Oracle ERP start off expecting to model.

A particular CAPSA problem seemed to be that 'a person capable of signing off expenses' and 'a thing against which expenses could be signed off' were both very heavyweight structures because there were expected to be about six of them in the company, whilst in a university every researcher got to be capable of signing off expenses against their own separate grant.

Arm still strong despite SoftBank loss as shipments pass a quarter of a trillion

Tom Womack

You don't have to. But releasing a marginal improvement annually still means that people replacing their 2018 smartphone with a 2024 smartphone get a nicer experience than the people who replaced their 2015 smartphone with a 2021 smartphone, and so on; as long as any chip company proceeds to the next ARM core, their competitors will be obliged either to do the same or to lower their prices. You can still get a brand new Samsung phone with eight A55 cores.

After long delays, Sapphire Rapids arrives, full of accelerators and superlatives

Tom Womack

The chips have been out with live hyperscale customers for a year, end-users willing to sign NDAs have had access to them in some of the clouds for three months, they have gone through twelve separate steppings, Intel has taken a half-billion-dollar charge on unshippable product (and is currently rearranging itself internally so that the processor group doesn't get to run new steppings through the fab at its own convenience); I think it is fair to call them battle-hardened now.

It's time to retire 'edge' from our IT vocabulary

Tom Womack

I'd always thought of 'edge' as 'can we run this chunky calculation on the nice fast ARM core that the user has already paid for in their smartphone, rather than on a no-better ARM core that AWS is charging us four cents an hour for' - if you're procuring new hardware for edge then you're doing it wrong.

Longstanding bug in Linux kernel floppy handling fixed

Tom Womack

I bought a new USB CD/DVD drive last Christmas, because I wanted to rip my Christmas CDs and my Mac Mini M1, whilst wonderful in almost all ways, lacks an optical drive. For as long as media is distributed on shiny discs, people will want to watch the shiny discs on computers ...

Japanese cubesat sends home pics from the far side of the Moon

Tom Womack

Re: Batteries not included

And Artemis was a particularly bad setup for cubesats, because they were installed in July 2021 and then there was no access to them, even for battery recharging, until the launch sixteen months later. 80% functionality rate for things stuck in a drawer for sixteen months is not bad ...

Minecraft's 'first luxury goods collection' features real-world $3,000 Burberry coat

Tom Womack

It's a perfectly normal high-quality overpriced beige coat from the front (https://us.burberry.com/monogram-motif-waterloo-trench-coat-p80647751) but with a really surprisingly ugly white creeper-face design in the small of the back.

I was thinking Burberry might be charging that much for a Burberry-check in-game skin, which would have been a bit more than averagely silly at the height of the NFT nonsense and absolutely ridiculous now.

Qualcomm: Arm threatens to end CPU licensing, charge device makers instead

Tom Womack

Re: For those unfamiliar with Qualcomm lawyers..

In particular, Qualcomm are accusing Arm of wanting to move to a per-device licensing model *like Qualcomm's*.

The 'no Arm extra IP blocks without an Arm CPU' part doesn't seem completely unreasonable, though I think Intel did do an x86-with-Mali SoC at one point - if Arm are willing to miss out on Mali revenue because they are worried about RISCV+Mali SoCs that's up to them, The claim that they might be moving to 'no non-Arm IP blocks on a chip with an Arm CPU' is obviously complete nonsense since Arm don't make memory or PCIe controllers.

HPE supercomputer to tell Singapore that it's hot, humid, probably going to rain

Tom Womack

That seems a tiny machine to be boasting about

0.4 petaflops, so a fifth the compute of the smallest machine in the June 2022 top500, and with 196 processors and no GPUs I would be startled if it took more than one rack (Frontier packs 128 processors and 512 GPUs per rack). I am surprised HPE bothered putting out a press release!

There are three larger supercomputers announced on the top500 in Singapore already.

Rambus offers chip designers a drop-in PCIe 6.0 subsystem

Tom Womack

In what form does this PCIe 6.0 Interface Subsystem come? Any PHY with fast SERDES is basically analogue design and very deeply process-specific at the moment, it would be nice to know what processes the blocks are available for.

(the press release at Rambus just says 'on advanced process nodes', it would be an interesting insight into the fabrication industry to know whether that includes Intel Integrated Foundry Services and Samsung's offerings, or just means TSMC N5 and N3)

Meta pours cash into servers and AI as ad revenue falls

Tom Womack

Oh no, an extra year before fanciest-available datacentre CPUs trickle down through the second-hand market to add to the medium-performance computing facility in my garage.

On the other hand, an enforced one-year gap on computer acquisition just as electricity bills are tripling is probably not the worst plan.

Getting that syncing feeling after an Exchange restore

Tom Womack

If the purpose of email is mostly to organise meetings, then being able to send emails which are instantly accepted or rejected into a calendar, and where the 'pick a time slot where the recipients are all free' is a function of the mail client is incredibly useful.

Arm says its Cortex-X3 CPU smokes this Intel laptop silicon

Tom Womack

Re: Girding of Loins

They've already done it - Amazon has warehouses full of its c7g Arm-based units, Apple sells Arm processors by the million.

Or is the only interpretation of "taking on Intel head-on" that you'd accept one in which Arm itself sells physical objects in the retail market to plug into sockets on motherboards, which they've been absolutely clear for twenty years they're never going to do.

Broadcom to 'focus on rapid transition to subscriptions' for VMware

Tom Womack

Re: Software as service once again

KVM? Xen? AWS is Xen, Azure is HyperV, Google Cloud is KVM ...

Tesla disables in-car gaming feature that allowed play while MuskMobiles were in motion

Tom Womack

At the price charged for Teslas, and the number of GPUs they have in them anyway, wouldn't a second touchscreen in the front mounted somewhere that the passenger can see it but the driver not be a sensible approach? It would also allow the driver's touchscreen to be moved somewhere that the driver isn't having to look away from the centreline to see it.

OpenBSD disables Intel’s hyper-threading over CPU data leak fears

Tom Womack

Re: A Kludge

The whole point of hyper threading is that it works well on code which *hasn't* been thoroughly optimised. If you've got two vector instructions lined up for each tick, hyper threading can't get you anything; if your thread is waiting two hundred ticks for the L3 cache to divulge the next operand, having another thread running until it too needs to wait for the L3 cache is extremely helpful.

I am very sad at the way that people are using fairly hypothetical security arguments to disable the features that make processors actually good at computing: I am willing to do my banking on my phone if that means my actual computers can crunch numbers at a higher percentage of its peak speed.

Tech rookie put decimal point in wrong place, cost insurer zillions

Tom Womack

Re: Lira?

I was, accidentally, there for the changeover; all the banks and ATMs were closed for a couple of days, which was somewhat irritating since I'd turned up without any cash because Romanian lei are hard to come by.

Particularly irritating was that, when the ATMs reopened, they were still dispensing the old notes!

Close Encounters of the Kuiper Belt kind: New Horizons to come within just 3,500km of MU69

Tom Womack

Re: It is a long way away from the sun

It's about 45AU from the Sun, so sunlight is two thousand times fainter than on Earth - but that's still about a hundred times brighter than full-moonlight, and with a camera on a good tripod you can take pretty good photos in full moonlight.

Page: