Interesting. This coincides with Intel poaching the AMD GPU head.
Intel has scrapped Knights Hill, an upcoming addition to its high-end many-core Xeon Phi chip family, and will go back to the drawing board for its microarchitecture. We heard at the end of last month that the Xeon Phi gang was, in the words of one well-placed semiconductor industry source, "not long for this world," which we …
I suspect the cause is 10nm being delayed rather than the AMD partnership.
IMHO, a paper release of Knights Hill would have probably been more damaging than cancelling it and releasing Knights Mill instead as people delay purchases and then move to another vendor rather than just moving urgent requirements to another vendor and then potentially considering Intel in the future.
To recap the Xeon Phi line: it's not for your common or garden server, workstation or desktop. It's aimed at supercomputer gear with machine code instructions to dash through operations on matrices and other blobs of data at high speed in parallel
So give them what they really want: GPUs and FPGAs.
Meanwhile, as the article notes at the end, China is now building its own supercomputers using its own silicon.
Sounds like some kind of Nervana to me...
From Wikipedaia: the Sunway TaihuLight uses a total of 40,960 Chinese-designed SW26010 manycore 64-bit RISC processors based on the Sunway architecture. ARM and RISC give customers options they didn't have a few years ago. Even Intel has started making noises about custom silicon and FPGAs…
PS. I think you mean Nirvana…
"Meanwhile, as the article notes at the end, China is now building its own supercomputers using its own silicon."
No doubt using ideas and concepts stolen from the chips of the moronic US chipmakers who outsourced fabrication to them. But hey, I'm sure saving a few pennies meant a better dividend for the shareholders, whats not to like? Can just imagine the CEOs: "Intellectual property theft? Major technological and geopolitical shift to china? Meh, I'm too old to care, let the next generation suffer the consequences. I'm off down the golf course"
No doubt using ideas and concepts stolen from the chips of the moronic US chipmakers who outsourced fabrication to them.
You learn surprisingly little from reverse-engineering a working chip: sure, you might be able make a copy it, provided that you have a compatible manufacturing process - but you still do not know why the things are done the the way they are, and what trade-offs were made in the design. So you still can't make a better, or just different, chip without repeating a good few mistakes the original designers did.
Just ask the Soviets - they were quite amazing at duplicating IBM and DEC computer designs for decades - and still had very little clue of what and why they were doing there.
The reason China can come up with their own, competitive supercomputer designs is because they've invested massively in basic science, engineering, and education. Sure, they -also- would reverse-engineer and steal the ideas when they could [frankly, it would be stupid not to take a peek at the most advanced designs you can lay your hands on: only an idiot would refuse to learn from somebody who's better than you at something you are interested in] - but they would not have been able to either understand or improve the designs if they weren't just a single step behind.
And now in some aspects they are ahead - so soon it will be our turn to reverse-engineer Chinese designs and try to learn from them. Or, of course, we could persist in the decades-old prejudices and let ourselves fall hopelessly behind. Dealer's choice.
It's more likely that the Chinese copy ideas rather than whole designs, it only takes one person employed by a western chip designer to pass on confidential information saving them years in R&D.
Inventing new architectures to improve single core performance is high risk as both Intel and AMD have found to their cost with Itanic / P4 / AMD FX. Conversely scaling up the number of cores in a supercomputer is lower risk.
The performance optimistions of modern CPUs require vastly more research investment than when the Russians copied designs back in the cold war. Unless some radical new idea emerges, it's unlikely the Chinese will improve single core CPU performance much, which is why they're concentrating their efforts on massive parallelism in supercomputers.
It seems to me that, for HPC workloads, having many x86 cores is a waste of silicon because once the workload is running all you really want are the Arithmetic Processing Units, supported by I/O to feed them and a scheduler to keep 'em busy - something along the lines of a general purpose core, such as x86, but with many complete FPUs per core instead of just one. I gather that MMX/SSE was a sort of effort in that direction but the MMX h/w elements were more limited than a full FPU.
It really does depend on exactly what you're doing with an HPC.
If you're doing any type of simulation, then HPC comes down as much to communication and shunting data around between processors/nodes as it is computation.
The flow is generally a computation cycle followed by a communication cycle to prepare for the next computation cycle.
Until you specialize your communications into silicon, moving data around is much better done using a general purpose CPU that an FPU/APU.
A proper HPC system is a balance of multiple different technologies.
Unfortunately, the Xeon Phi just didn't catch on in a way Intel would have liked – it can be tricky to program that many cores efficiently, it's restricted to niche HPC projects ...
The Knights Landing systems are actually very easy to program - it appears to the end user as a sizeable SMP box, which can run a bog standard x86_64 development environment, and do so at an acceptable speed. My Phi development box runs a standard install of OpenSuSe, with a normal X11 head, and the usual Intel development stack - exactly the same I use for the "normal" x86_64 development. If you don't like intel tools (or don't want to pay for them), gcc generates pretty decent Phi code starting with version 6.
In ferms of ease of development and code efficiency the Knights Landing is really perfect - I didn't have that much fun writing massively-multithreaded code since SGI Origin - and that I had to share with many people. It is certainly much more programmer-friendly than the nvidia's or amd's GPU offerings.
The Phi's Achilles heel is not the development or absolute performance - it is pricing. For codes which are similar in performance characteristics to SPEC CPU (and that's a good chunk of HPC codes - including mine), Intel's pricing for the lower-end Phis works out at nearly exactly the same $/usable flop as mid-range Xeons. So if you have a fixed $$ upfront budget, but can tolerate about 50% utilities cost (which is often the case for small and medium-sized HPC facilities), you can get considerbly better maximum performance and more flexibility by going the regular Xeon (and now Zen) route.
Major, national-scale facilities are different, of course - but this is not where most of the market is.
Yes, it looked nice, but upfront price was a bit of a killer. And CUDA has a sort of persistent cool factor, people seem quite keen to do CUDA, while not so big on normal threading. (That and it doesn't get the number of processing units GPU has, so I think there was a bit of uncertainty because it sat in that mid-range.)
There are other factors in play too. Some very specialised systems care about Teraflops/foot^3. CPUs are actually quite useful here, because GPUs tend to need a CPU to look after them anyway. Phi is also pretty good in that space.
CPU systems are also very good for streaming applications; data can arrive via some DMA from a data acquisition system, and is then right there in memory ready to be processed, immediately.
In contrast GPUs have their own memory subsystem, no I/O other than the PCIe interface and no I/O friendly API allowing another device to DMA data directly to it; it has to hop through the CPU's memory space first so that CUDA can then deal with the data transfer to the GPU card. This means that once the data has arrived in the system you can't just get on with it; there's still more memory shuffling to do.
So a well setup / programmed CPU system can hum along at near 100% 24/7 (I've done several like that), whilst a GPU is inevitably processing in bursts, playing catch up.
NVidia are slowly learning this, with NVLink beginning to be useful; still a long way to go though. Once they've learned about DMA patterns, addressing, and crossbar switches for NVLink that's when it'll actually start being useful. But that's when it'll become just yet another network technology that costs as much as Ethernet to develop but without the mass market appeal...
The x200 (Knights Landing) Xeon Phis are full x86_64 CPUs, with all the usual gubbins (but sans virtualization and enterprise-related bits). They can boot unmodified x86_64 Linux kernels, and will run most binaries compiled for modern Xeons (up to and including avx2). It might also boot Windows (never tried, and don't care) - but the licensing cost may be seriously interesting.
Each core is slower than a modern Xeon core (by a 2x-3x factor running unmodified Xeon binaries), but there are a lot of them (64 to 72, depending on the SKU). In real-life usage, you will probably want to recompile to use the AVX512 vector units; in favorable cases this could improve the FLOPS rate by 4x factor.
The package also comes with 16GB on-package L3 cache, which is quite fast (400+GB/s of application-visible bandwith) and six DDR4 memory channels, so you can go up to 384GBytes. It will take standard PCIe peripherals, and the standard Linux drivers work just fine.
So it is a quite serious x86_64 CPU. Unfortunately, the prices are also rather non-funny (See https://ark.intel.com/products/series/92650/Intel-Xeon-Phi-x200-Product-Family), and given the low volume you are unlikely to do much better than the RCP. If you can't buy in the US, the usual 1:1 USD to EUR/GBP conversion rates make it worse.
All my friends were BSD and Sun types. What is the overwheliming advantage of the linux kernel in the area?
It already work well enough in most situations, and supports all the bits one might usually need. Oh, and there is no per-core or per-socket licensing cost - that would be a killer.
HPC tends to be pretty pragmatic as far as the O/S choice is concerned - as long as some variation of POSIX is suported, it is usually possible to convince most HPC codes to run. I never had to use BSD in anger, but SunOS and Solaris were/are fairly pleasant for HPC - but so were/are Irix, AIX, and UNICOS. Windows on the other hand ...
So no Phi cross over? What is Intel going do about the x86? A company that thought more was more not less performance, and wasn't going be strong armed into a more lean architecture.
Well, the solution is to use use existing binary translation software like used to run x86 binaries on Arm or other, and design a better processor ISA, and one that its own translation modules perform best at, then you don't need x86 (Oh, that part's happened already).
Lenovo has inked an agreement with Spain's Barcelona Supercomputing Center for research and development work in various areas of supercomputer technology.
The move will see Lenovo invest $7 million over three years into priority sectors in high-performance computing (HPC) for Spain and the EU.
The agreement was signed this week at the Barcelona Supercomputing Center-National Supercomputing Center (BSC-CNS), and will see Lenovo and the BSC-CNS try to advance the use of supercomputers in precision medicine, the design and development of open-source European chips, and developing more sustainable supercomputers and datacenters.
The US Department of Energy is looking to vendors that will help build supercomputers up to 10 times faster than the recently inaugurated Frontier exascale system to come on stream between 2025 and 2030, and even more powerful systems than that for the 2030s.
These details were disclosed in a request for information (RFI) issued by the DoE for computing hardware and software vendors, system integrators and others to "assist the DoE national laboratories (labs) to plan, design, commission, and acquire the next generation of supercomputing systems in the 2025 to 2030 time frame."
Vendors have until the end of July to respond.
Predicting the weather is a notoriously tricky enterprise, but that’s never held back America's National Oceanic and Atmospheric Administration (NOAA).
After more than two years of development, the agency brought a pair of supercomputers online this week that it says are three times as powerful as the machines they replace, enabling more accurate forecast models.
Developed and maintained by General Dynamics Information Technology under an eight-year contract, the Cactus and Dogwood supers — named after the fauna native to the machines' homes in Phoenix, Arizona, and Manassas, Virginia, respectively — will support larger, higher-resolution models than previously possible.
Analysis Supermicro launched a wave of edge appliances using Intel's newly refreshed Xeon-D processors last week. The launch itself was nothing to write home about, but a thought occurred: with all the hype surrounding the outer reaches of computing that we call the edge, you'd think there would be more competition from chipmakers in this arena.
So where are all the AMD and Arm-based edge appliances?
A glance through the catalogs of the major OEMs – Dell, HPE, Lenovo, Inspur, Supermicro – returned plenty of results for AMD servers, but few, if any, validated for edge deployments. In fact, Supermicro was the only one of the five vendors that even offered an AMD-based edge appliance – which used an ageing Epyc processor. Hardly a great showing from AMD. Meanwhile, just one appliance from Inspur used an Arm-based chip from Nvidia.
Exclusive A court case which would have seen Atos take on the UK government over a £854 million (c $1 billion) supercomputer contract for the Meteorological Office has ended before it began.
The case, Atos Services UK Ltd v Secretary of State for Business, Energy, and Industrial Strategy and The Meteorological Office, concerns an agreement last year between the Met Office and Microsoft to provision a new supercomputer to "take weather and climate forecasting to the next level."
The system is intended to be the world's most advanced weather and climate system, and was expected to be twice as powerful as any other supercomputer in the UK when it becomes operational in the summer.
In yet another sign of how fortunes have changed in the semiconductor industry, Taiwanese foundry giant TSMC is expected to surpass Intel in quarterly revenue for the first time.
Wall Street analysts estimate TSMC will grow second-quarter revenue 43 percent quarter-over-quarter to $18.1 billion. Intel, on the other hand, is expected to see sales decline 2 percent sequentially to $17.98 billion in the same period, according to estimates collected by Yahoo Finance.
The potential for TSMC to surpass Intel in quarterly revenue is indicative of how demand has grown for contract chip manufacturing, fueled by companies like Qualcomm, Nvidia, AMD, and Apple who design their own chips and outsource manufacturing to foundries like TSMC.
Intel has found a new way to voice its displeasure over Congress' inability to pass $52 billion in subsidies to expand US semiconductor manufacturing: withholding a planned groundbreaking ceremony for its $20 billion fab mega-site in Ohio that stands to benefit from the federal funding.
The Wall Street Journal reported that Intel was tentatively scheduled to hold a groundbreaking ceremony for the Ohio manufacturing site with state and federal bigwigs on July 22. But, in an email seen by the newspaper, the x86 giant told officials Wednesday it was indefinitely delaying the festivities "due in part to uncertainty around" the stalled Creating Helpful Incentives to Produce Semiconductors (CHIPS) for America Act.
That proposed law authorizes the aforementioned subsidies for Intel and others, and so its delay is holding back funding for the chipmakers.
Comment Intel has begun shipping its cryptocurrency-mining "Blockscale" ASIC slightly ahead of schedule, and the timing could not be more unfortunate as digital currency values continue to plummet.
Raja Koduri, the head of Intel's Accelerated Computing Systems and Graphics group, tweeted Wednesday the company has started initial shipments of the Blockscale ASIC to crypto-mining firms Argo Blockchain, Hive Blockchain and Griid:
Intel is claiming a significant advancement in its photonics research with an eight-wavelength laser array that is integrated on a silicon wafer, marking another step on the road to on-chip optical interconnects.
This development from Intel Labs will enable the production of an optical source with the required performance for future high-volume applications, the chip giant claimed. These include co-packaged optics, where the optical components are combined in the same chip package as other components such as network switch silicon, and optical interconnects between processors.
According to Intel Labs, its demonstration laser array was built using the company's "300-millimetre silicon photonics manufacturing process," which is already used to make optical transceivers, paving the way for high-volume manufacturing in future. The eight-wavelength array uses distributed feedback (DFB) laser diodes, which apparently refers to the use of a periodically structured element or diffraction grating inside the laser to generate a single frequency output.
Having successfully appealed Europe's €1.06bn ($1.2bn) antitrust fine, Intel now wants €593m ($623.5m) in interest charges.
In January, after years of contesting the fine, the x86 chip giant finally overturned the penalty, and was told it didn't have to pay up after all. The US tech titan isn't stopping there, however, and now says it is effectively seeking damages for being screwed around by Brussels.
According to official documents [PDF] published on Monday, Intel has gone to the EU General Court for “payment of compensation and consequential interest for the damage sustained because of the European Commissions refusal to pay Intel default interest."
Biting the hand that feeds IT © 1998–2022