back to article Intel drags Xeon Phi Knights Hill chips out back... two shots heard

Intel has scrapped Knights Hill, an upcoming addition to its high-end many-core Xeon Phi chip family, and will go back to the drawing board for its microarchitecture. We heard at the end of last month that the Xeon Phi gang was, in the words of one well-placed semiconductor industry source, "not long for this world," which we …

  1. Voland's right hand Silver badge


    Interesting. This coincides with Intel poaching the AMD GPU head.

    1. Sgt_Oddball Silver badge

      Re: 2+2=?

      2+2=3/4/5 depending on rounding errors.

      1. PNGuinn

        Re: 2+2=?


        This is chipzilla, remember.

    2. Roj Blake
      Big Brother

      Re: 2+2=?

      According to a nice Mr O'Brien I once met, 2+2=5.

    3. theblackhand

      Re: 2+2=?

      I suspect the cause is 10nm being delayed rather than the AMD partnership.

      IMHO, a paper release of Knights Hill would have probably been more damaging than cancelling it and releasing Knights Mill instead as people delay purchases and then move to another vendor rather than just moving urgent requirements to another vendor and then potentially considering Intel in the future.

  2. Charlie Clark Silver badge

    To recap the Xeon Phi line: it's not for your common or garden server, workstation or desktop. It's aimed at supercomputer gear with machine code instructions to dash through operations on matrices and other blobs of data at high speed in parallel

    So give them what they really want: GPUs and FPGAs.

    Meanwhile, as the article notes at the end, China is now building its own supercomputers using its own silicon.

    1. Korev Silver badge

      So give them what they really want: GPUs and FPGAs.

      Sounds like some kind of Nervana to me...

      1. Charlie Clark Silver badge

        Sounds like some kind of Nervana to me...

        From Wikipedaia: the Sunway TaihuLight uses a total of 40,960 Chinese-designed SW26010 manycore 64-bit RISC processors based on the Sunway architecture. ARM and RISC give customers options they didn't have a few years ago. Even Intel has started making noises about custom silicon and FPGAs…

        PS. I think you mean Nirvana…

        1. defiler

          PS. I think you mean Nirvana…

          I'm going with Nerdvana.


          A self-confessed nerd.

        2. Korev Silver badge

          PS. I think you mean Nirvana…

          Nope :)

        3. joeldillon

          Uhh. RISC has been around since like the mid 80s? It's been an option since Intel was churning out 386s.

          1. Charlie Clark Silver badge

            Uhh. RISC has been around since like the mid 80s? It's been an option since Intel was churning out 386s.

            Not RISC versus CISC, but real RISC designs from MIPS, I think.

    2. Anonymous Coward
      Anonymous Coward

      "Meanwhile, as the article notes at the end, China is now building its own supercomputers using its own silicon."

      No doubt using ideas and concepts stolen from the chips of the moronic US chipmakers who outsourced fabrication to them. But hey, I'm sure saving a few pennies meant a better dividend for the shareholders, whats not to like? Can just imagine the CEOs: "Intellectual property theft? Major technological and geopolitical shift to china? Meh, I'm too old to care, let the next generation suffer the consequences. I'm off down the golf course"

      1. Anonymous Coward
        Anonymous Coward

        No doubt using ideas and concepts stolen from the chips of the moronic US chipmakers who outsourced fabrication to them.

        You learn surprisingly little from reverse-engineering a working chip: sure, you might be able make a copy it, provided that you have a compatible manufacturing process - but you still do not know why the things are done the the way they are, and what trade-offs were made in the design. So you still can't make a better, or just different, chip without repeating a good few mistakes the original designers did.

        Just ask the Soviets - they were quite amazing at duplicating IBM and DEC computer designs for decades - and still had very little clue of what and why they were doing there.

        The reason China can come up with their own, competitive supercomputer designs is because they've invested massively in basic science, engineering, and education. Sure, they -also- would reverse-engineer and steal the ideas when they could [frankly, it would be stupid not to take a peek at the most advanced designs you can lay your hands on: only an idiot would refuse to learn from somebody who's better than you at something you are interested in] - but they would not have been able to either understand or improve the designs if they weren't just a single step behind.

        And now in some aspects they are ahead - so soon it will be our turn to reverse-engineer Chinese designs and try to learn from them. Or, of course, we could persist in the decades-old prejudices and let ourselves fall hopelessly behind. Dealer's choice.

        1. Anonymous Coward
          Anonymous Coward

          It's more likely that the Chinese copy ideas rather than whole designs, it only takes one person employed by a western chip designer to pass on confidential information saving them years in R&D.

          Inventing new architectures to improve single core performance is high risk as both Intel and AMD have found to their cost with Itanic / P4 / AMD FX. Conversely scaling up the number of cores in a supercomputer is lower risk.

          The performance optimistions of modern CPUs require vastly more research investment than when the Russians copied designs back in the cold war. Unless some radical new idea emerges, it's unlikely the Chinese will improve single core CPU performance much, which is why they're concentrating their efforts on massive parallelism in supercomputers.

    3. Anonymous Coward
      Anonymous Coward

      So give them what they really want

      It seems to me that, for HPC workloads, having many x86 cores is a waste of silicon because once the workload is running all you really want are the Arithmetic Processing Units, supported by I/O to feed them and a scheduler to keep 'em busy - something along the lines of a general purpose core, such as x86, but with many complete FPUs per core instead of just one. I gather that MMX/SSE was a sort of effort in that direction but the MMX h/w elements were more limited than a full FPU.

      1. Peter Gathercole Silver badge

        Re: So give them what they really want

        It really does depend on exactly what you're doing with an HPC.

        If you're doing any type of simulation, then HPC comes down as much to communication and shunting data around between processors/nodes as it is computation.

        The flow is generally a computation cycle followed by a communication cycle to prepare for the next computation cycle.

        Until you specialize your communications into silicon, moving data around is much better done using a general purpose CPU that an FPU/APU.

        A proper HPC system is a balance of multiple different technologies.

  3. Anonymous Coward
    Anonymous Coward

    Xeon Phi

    Unfortunately, the Xeon Phi just didn't catch on in a way Intel would have liked – it can be tricky to program that many cores efficiently, it's restricted to niche HPC projects ...

    The Knights Landing systems are actually very easy to program - it appears to the end user as a sizeable SMP box, which can run a bog standard x86_64 development environment, and do so at an acceptable speed. My Phi development box runs a standard install of OpenSuSe, with a normal X11 head, and the usual Intel development stack - exactly the same I use for the "normal" x86_64 development. If you don't like intel tools (or don't want to pay for them), gcc generates pretty decent Phi code starting with version 6.

    In ferms of ease of development and code efficiency the Knights Landing is really perfect - I didn't have that much fun writing massively-multithreaded code since SGI Origin - and that I had to share with many people. It is certainly much more programmer-friendly than the nvidia's or amd's GPU offerings.

    The Phi's Achilles heel is not the development or absolute performance - it is pricing. For codes which are similar in performance characteristics to SPEC CPU (and that's a good chunk of HPC codes - including mine), Intel's pricing for the lower-end Phis works out at nearly exactly the same $/usable flop as mid-range Xeons. So if you have a fixed $$ upfront budget, but can tolerate about 50% utilities cost (which is often the case for small and medium-sized HPC facilities), you can get considerbly better maximum performance and more flexibility by going the regular Xeon (and now Zen) route.

    Major, national-scale facilities are different, of course - but this is not where most of the market is.

    1. Korev Silver badge

      Re: Xeon Phi

      An interesting point about the pricing. At ISC Intel were saying that they were as fast as Nvidia for things like Molecular Dynamics; only cheaper :)

    2. ibmalone Silver badge

      Re: Xeon Phi

      Yes, it looked nice, but upfront price was a bit of a killer. And CUDA has a sort of persistent cool factor, people seem quite keen to do CUDA, while not so big on normal threading. (That and it doesn't get the number of processing units GPU has, so I think there was a bit of uncertainty because it sat in that mid-range.)

    3. Anonymous Coward
      Anonymous Coward

      Re: Xeon Phi

      There are other factors in play too. Some very specialised systems care about Teraflops/foot^3. CPUs are actually quite useful here, because GPUs tend to need a CPU to look after them anyway. Phi is also pretty good in that space.

      CPU systems are also very good for streaming applications; data can arrive via some DMA from a data acquisition system, and is then right there in memory ready to be processed, immediately.

      In contrast GPUs have their own memory subsystem, no I/O other than the PCIe interface and no I/O friendly API allowing another device to DMA data directly to it; it has to hop through the CPU's memory space first so that CUDA can then deal with the data transfer to the GPU card. This means that once the data has arrived in the system you can't just get on with it; there's still more memory shuffling to do.

      So a well setup / programmed CPU system can hum along at near 100% 24/7 (I've done several like that), whilst a GPU is inevitably processing in bursts, playing catch up.

      NVidia are slowly learning this, with NVLink beginning to be useful; still a long way to go though. Once they've learned about DMA patterns, addressing, and crossbar switches for NVLink that's when it'll actually start being useful. But that's when it'll become just yet another network technology that costs as much as Ethernet to develop but without the mass market appeal...

  4. Anonymous Coward
    Anonymous Coward

    Meanwhile at CSCS

    Seems the Nvidia boat has left already ...

    1. Korev Silver badge

      Re: Meanwhile at CSCS

      They have Intel accelerators there too in some of their other systems. They switched from Opteron to Xeon a few years ago; they'd probably happily switch accelerators too.

  5. teknopaul Silver badge

    The knights who say Phi are going to be spending a but more time with thier shrubbery.

    I'll get my coat.

  6. jms222

    Is this the device made from ancient P54C and gaffer tape ?

  7. Aladdin Sane

    That's all well and good

    But can it run Crysis?

  8. bob, mon!


    Are these things really 32-bit-oriented processors? Or are they x86-64 compatible? (Or - shudder - Itanium?)

    1. Anonymous Coward
      Anonymous Coward

      Re: x86?

      The x200 (Knights Landing) Xeon Phis are full x86_64 CPUs, with all the usual gubbins (but sans virtualization and enterprise-related bits). They can boot unmodified x86_64 Linux kernels, and will run most binaries compiled for modern Xeons (up to and including avx2). It might also boot Windows (never tried, and don't care) - but the licensing cost may be seriously interesting.

      Each core is slower than a modern Xeon core (by a 2x-3x factor running unmodified Xeon binaries), but there are a lot of them (64 to 72, depending on the SKU). In real-life usage, you will probably want to recompile to use the AVX512 vector units; in favorable cases this could improve the FLOPS rate by 4x factor.

      The package also comes with 16GB on-package L3 cache, which is quite fast (400+GB/s of application-visible bandwith) and six DDR4 memory channels, so you can go up to 384GBytes. It will take standard PCIe peripherals, and the standard Linux drivers work just fine.

      So it is a quite serious x86_64 CPU. Unfortunately, the prices are also rather non-funny (See, and given the low volume you are unlikely to do much better than the RCP. If you can't buy in the US, the usual 1:1 USD to EUR/GBP conversion rates make it worse.

  9. david 12 Silver badge

    Why linux not BSD?

    All my friends were BSD and Sun types. What is the overwheliming advantage of the linux kernel in the area?

    1. Anonymous Coward
      Anonymous Coward

      Re: Why linux not BSD?

      All my friends were BSD and Sun types. What is the overwheliming advantage of the linux kernel in the area?

      It already work well enough in most situations, and supports all the bits one might usually need. Oh, and there is no per-core or per-socket licensing cost - that would be a killer.

      HPC tends to be pretty pragmatic as far as the O/S choice is concerned - as long as some variation of POSIX is suported, it is usually possible to convince most HPC codes to run. I never had to use BSD in anger, but SunOS and Solaris were/are fairly pleasant for HPC - but so were/are Irix, AIX, and UNICOS. Windows on the other hand ...

  10. Sssss

    So no Phi cross over? What is Intel going do about the x86? A company that thought more was more not less performance, and wasn't going be strong armed into a more lean architecture.

    Well, the solution is to use use existing binary translation software like used to run x86 binaries on Arm or other, and design a better processor ISA, and one that its own translation modules perform best at, then you don't need x86 (Oh, that part's happened already).

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like

Biting the hand that feeds IT © 1998–2022