back to article Intel adds fresh x86 and vector instructions for future chips

Intel has revealed two sets of extensions coming to the x86 instruction set architecture, one to boost the performance of general purpose code and the second to provide a common vector instruction set for future chips. Some of the details were revealed on Intel’s developer website, showing the Advanced Performance Extensions ( …

  1. Yorick Hunt Silver badge

    Great...

    More buggy microcode to exploit, while adding net zero worth as nobody wants to compile architecture-specific code to utilise the new functionality.

    1. Mike 137 Silver badge

      Re: Great...

      "nobody wants to compile architecture-specific code to utilise the new functionality"

      But the benefit is that it will ensure profitable churn for the industry - applications using the new instructions will not run at all on older CPUs, so we'll all have to 'upgrade' our hardware.

      1. Korev Silver badge
        Boffin

        Re: Great...

        It's not difficult to create binaries that are clever enough to work out what chip they're on and run the right bit of code accordingly.

        I guess it's a case of willingness...

        1. Johannesburgel12

          Re: Great...

          You know AVX512 alone has 20 optional extensions that are not implemented by every Intel CPU, right?

          1. Korev Silver badge
            Pint

            Re: Great...

            I didn't!

            Do you have a link with some info?

          2. CowHorseFrog Silver badge

            Re: Great...

            Which. makes them worthless, because who wants to detect have load 20x different variants of code some which use old isntructions and some that use new ones.

      2. Grogan Silver badge

        Re: Great...

        That's not what happens. What happens is that nobody dares to distribute binaries with them turned on.

        If you want everyone to be able to run your binaries, all you are guaranteed on the x86_64 arch is SSE2. You can up that a little with games (SSE3) but there are still CPUs in use that don't support SSE 4,2 and popcnt, never mind AVX. So unless you do CPU detection gymnastics, you can't even use them. (and simply going for the model number or listed instructions won't do, you have to run code tests or you're going to have a lot of complaints of crashing games)

        1. Hi Wreck

          Re: Great...

          Most binaries use shared libraries. They can be compiled/optimized for the specific CPU. However, all this stuff that Intel is doing seems to me to be adding lipstick to a pig.

        2. that one in the corner Silver badge

          Re: Great...

          Hardly "gymnastics" - there are specific registers that can be read that allow you to determine what parts of the instruction set are available - as the article says:

          > Developer code will only need to check three fields, according to Intel: A CPUID feature bit indicating that AVX10 is supported, the AVX10 version number, and a bit indicating the maximum supported vector length.

          You only need to write the detection code into one library (and then keep it updated as new material arrives) - or find someone who has already done that - and then use the flags that extracts as required.

          If you want to do some gymnastics, you can set up your build system to auto-generate multiple object files from recompiling the source once per whatever architecture variants you want to use and also generate the single function that the rest of your code will call; that function simply checks the flags (from the library, above) sets a pointer to the appropriate one of the multiple objects and then invokes via that pointer (you only do the check and assign on the first call, then the pointer is null, of course).

          Again, you set up that build the once and then just need to maintain it in sync with the identification library.

          As far as tricksy coding goes, that isn't really very hard gymnastics - call it a gate-vault rather than jumping over the box.

      3. Fred Goldstein

        Re: Great...

        Speaking as a Windows user, I know this may seem a bit out of character, but this does seem to be an advantage for a lot of Linux applications, where it is more common to make source code available to be compiled with whatever options the target can use...

        1. Grogan Silver badge

          Re: Great...

          Yes, the compiler has heuristics that can choose instructions if you give it a target that supports them but that doesn't necessarily mean it's going to use those instructions in optimal places. For that, you have to write code to benefit from it and pass specific flags in your compile commands (e.g. -mavx2) for those objects.

          1. MacroRodent

            Re: Great...

            The added general-purpose registers are going to benefit almost all code. Intel processors have always been "register-starved".

    2. Anonymous Coward
      Anonymous Coward

      Not how it works

      Modern compilers generate binaries that support multiple architectures. The code chooses which version to run at run time. https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/ax-qax.html

      1. CowHorseFrog Silver badge

        Re: Not how it works

        and now we know why 32g isnt enuff after loading a few apps.

    3. Gaius

      nobody wants to compile architecture-specific code to utilise the new functionality

      No-one needs to do that. As long as the maths libs on your system are up to date, any existing code that links against them will pick up routines that are compiled for it.

  2. Ken Hagan Gold badge

    Marketing strikes again

    So the various versions of AVX are: (implicitly) 1, 2, 512, 10, 10/256 and 10/512.

    I take it that no numerate person was involved in choosing these names.

    1. Korev Silver badge
      Alien

      Re: Marketing strikes again

      Next you'll be saying that Windows 11 is newer than Windows 98 - you're crazy man...

      1. captain veg Silver badge

        Re: Marketing strikes again

        It's newer even than Windows 2000.

        -A.

      2. Dale 3

        Re: Marketing strikes again

        Y2k bug all over again.

    2. Johannesburgel12

      Re: Marketing strikes again

      That's just the major versions. AVX512 alone has 20 optional subvariants, and Intel already announced right now that AVX10.1 will be followed by AVX10.2.

  3. TheMajectic

    Far too many x86 instructions

    There are already hundreds of instructions no one uses anymore so what's a few more

    1. Anonymous Coward
      Anonymous Coward

      Re: Far too many x86 instructions

      Don't forget your loop to call HCF occassionally.

      1. Crypto Monad Silver badge

        Re: Far too many x86 instructions

        With the fire extinguisher to handle speculative execution

  4. Will Godfrey Silver badge

    ... and probably quite a few that nobody has ever used!

  5. BPontius

    Stripping out 32-Bit arch.

    Thought Intel was going to strip out the 32 Bit architecture from future processors. Pentium 4 was the last Intel 32 bit processor, PCs nowadays are all 64 bit. Even software developers are dumping 32 bit software and drivers.

    1. Proton_badger

      Re: Stripping out 32-Bit arch.

      They’ve published a whitepaper proposing to remove the support for 32bit kernels, not 32bit apps.

      Apple on the other hand dropped support for 32bit apps with the M1.

  6. Kev99 Silver badge

    Once I ask this question. How does an original chip, be it CPU, GPU. Pi, ARM, or whatever get coded. I can understand how a designer can add registers through modifying the microcode or whatever, but how does the original chip designer get the chip to respond to:

    r

    PB PC NVmxDIZC .A .X .Y SP DP DB

    ; 00 E012 00110000 0000 0000 0002 CFFF 0000 00

    g 2000

    BREAK

    And then how does the designer get the mirocode to respond to machine code, and so on.

    1. that one in the corner Silver badge

      So, anyone going to try and fit the answer to that one into a single comment or should we just respond with the ISBN of our favourite 800 page text book on the subject?[1]

      [1] nowhere near my bookshelves at the moment; must get around to memorising those ISBNs one of these days.

      PS not a dig, Kev99, that is a sensible and worthy question, just that there are quite a few things that can (ought to be) be put into a decent answer to it!

      1. John Young 1

        I'm waiting for an ISBN to be provided, would make some good bedtime reading ;)

        1. Hull
          Go

          I guess the OP is referring to either

          ISBN 978-0128122761 or

          ISBN 978-0128119051

          1. that one in the corner Silver badge

            Re: I guess the OP is referring to either

            (got a bit distracted, meant to get back here earlier - hope I'm not too late)

            ISBN 978-0128119051 - yeah, pretty much so. "Computer Architecture: A Quantitative Approach" by John L. Hennessy, David A. Patterson. Although, just to be weird, I have the first edition of this, which is rather old now, but I looked at the later editions yesterday and came to the conclusion that the newer editions have lost some of the introductory material in order to squeeze in the more modern material! If you're not too worried about being totally up to date the older edition(s) are really cheap on Abe Books.

            The same authors also have another series out on the hardware/software interface, which may be more interesting/accessible to programmers - they still delve into the architecture below the ASM opcodes (i.e. microcode) and there are now separate editions that cover "general" computers (e.g. the Intel Core i7, IBM Cell ...), ARM and RISC-V. Apparently also a MIPS edition. I haven't looked through those thoroughly, but one or more of those looks like it would be a good companion for my older "Quantitative Approach". Although second-hand 'cos they are rather pricey :-(

            Also:

            I haven't looked at this one, but it is apparently worth a look (and it is cheaper than the ones above): Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors by Jean-Loup Baer. 2009 vintage, it is said to concentrate on the micro(code) architecture level, rather than any particular ASM opcodes presented to the programmer. It does refer to the Alpha, P6 and Athlon as examples (vintage!).

      2. David Hicklin Silver badge

        The Z80 (with the zx81) book was an epic in itself, now compare that with the 6502 that my Atari 800XL used.....so simple

        1. Crypto Monad Silver badge

          I was a big fan of 6800 and 6502, but IMO the pinnacle was the 6809. Two 16-bit index registers: luxury!

    2. Anonymous Coward
      Anonymous Coward

      Hardware Description Language

      This has all probably been superseded, but back in the dawn of time I did a bit of circuit design using VHDL. Maybe start with the Wikipedia page for this (or Hardware Description Language, for another overview). It's a big topic...

    3. Anonymous Coward
      Anonymous Coward

      Awesome question! Short answer: it's complicated, bordering on magic.

      I have some ability to program in high level languages. I can do a little bit with assembly. In college I learned about boolean logic, including the theory, the math involved, and actual implementation with basic gates and related components.

      In about my 3rd year of an Electrical and Computer Engineering degree program, I took a course on designing with MSI (medium scale integration) parts such as muxes, encoders, decoders, etc.

      By the end of that course, I was finally starting to see the picture of how software really connects to hardware. I had moderately complex chips (internally formed from primative gates, which were built from transistors, which were themselves constructed from semiconductors). That hardware reacted according to the state of data stored within the parts that could represent data.

      Unfortunately, I don't understand it nearly well enough to be able to explain it to anyone.

      To sone extent, nobody ever really does understand it all. We specialize in our own area, and the rest is just an abstract concept. We understand the interface to the layer "above" or "below" our level and we leave the implementation details to others.

    4. CowHorseFrog Silver badge

      they dont add registers by modifying microcode. Every register needs to exist within the CPU, after all the values a register holds need to be held somewhere.

      They are probably using FPGA or CPU simulators in software.

      1. that one in the corner Silver badge

        Not quite accurate.

        The registers (can) live in a generic array, the "register file" - and the size of that is certainly set by the hardware design.

        However, they *can* modify how the register file is accessed via the Instruction Set Architecture (i.e. the ASM opcodes we use to program the beast) and by adding/removing/modifying entries in the ISA they can change how many registers our code can see, how those are used and so forth. How much of that is possible is down to how flexible/generic their microcode design is - and I have no idea how flexible any of the Intel microarchitectures are.

    5. 1000100110010111100101110000

      Leaving out a few details...

      Imagine a counter, which is a bunch of gates with a number of output lines that together form a number, and every input pulse causes these outputs to form the next number. Take that number, feed it to your memory bank, fetch the value in that cell. Let's say for the example it's a byte wide, so eight bits. Feed that value to your decode ROM, that for each possible input, in this case that's 256 possibilities, produces ones and zeroes at N output lines, that set up various other blocks of gates for the function the instruction is supposed to perform.

      A gate, by the by, is a binary logic function, like and, or, not, or a bunch of others, sometimes with many more than one or two inputs, and outputs capable of driving one or more, sometimes many more, inputs. Built out of transistors. With them you can build function blocks that probably include a register file, at least one arithmetic unit, i/o sections, memory access (duh), and so on.

      Wait for all the outputs to settle, call it a tick, and go for the next one. IE the "we're stable now, the results are now valid" signal triggers the next cycle at the program counter. (Well, usually there's a clock and the logic just has to be fast enough with the settling to keep up, but let's keep things conceptually simple.)

      N can be more or less as big as you need. Though you may need to have multiple steps for a single (byte) instruction, so the decode sets up a micro-counter to run through a list of micro-instructions that together make up the one (byte) instruction, suspending the main program counter for the duration. Those again are stored in a bit of ROM with each row wide enough to drive the control lines to the various function blocks.

      This is the general idea. In the pursuit of performance CPUs get a lot more complex, but for that, there's the textbooks. Computer architecture courses often study MIPS because it's nice and accessible. Or you could look up the verilog to the "J1 FORTH CPU", it is pretty readable and very simple, no microcode. Or look at, say, the "magic-1" homebrew CPU. Compare and contrast, get accidentally sucked in, and before you know it you've designed and built your own.

  7. Anonymous Coward
    Anonymous Coward

    For that baseline support across many chips - are they going to make such elements of the spec available to AMD, or the Chinese X86 clones that I can't remember the name of?

    Thought not. I stand to be proven wrong.

    AMD was very smart when it became clear that Itanic was sinking, and they offered up X86-64 extensions as a 64 bit solutions. Including making the relevant parts of the spec available to competitors.

    1. talk_is_cheap

      AMD and Intel have a long-term cross-license agreement in place for such extensions.

    2. Newold

      Intel Compiler again?

      I guess those elements of the specs will be available to other CPU manufacturers, free or licensed I don't know, but if memory serves me right and my amateurish knowledge doesn't mix things up, wasn't their trouble in the past with the (popular because real good) Intel Compiler checking for "Genuine Intel" Processor and (deliberately?) delivering very suboptimal instructions/code for all non-Intel Processors; ingnoring completly the instruction cababilities statet by the CPU (like SSE2)? Now, when: "Developer code will only need to check three fields, according to Intel: A CPUID feature bit indicating that AVX10 is supported, the AVX10 version number, and a bit indicating the maximum supported vector length.", could such things happen again?

  8. hammarbtyp

    1984 all over again

    The i960 had 32 registers, so while welcome, what took them so long

    1. that one in the corner Silver badge

      Re: 1984 all over again

      All the painful hoops they had to go through to add the new stuff whilst still keeping the entirety of the rest of the x86 model working, which is a massive pile of - random stuff - nowadays.

      The i960 was attempting to get away from all that - and history demonstrated that people just wanted to keep using x86, with its limited and rather non-generic registers 'cos that is what all of the code in the world actually runs on.

      Plus they seem to need to announce "big special changes" like "new AI data types" (huh?[1] you mean signed/unsigned/fixed-point values in various bitwidths? how very novel and AI'ish) rather than "guys, you can use R8 to R15 now, okay". Even if Real Programmers end up using the new registers/opcodes "just" to speed up str(n)len (which is actually a really useful thing to do, but doesn't fill a four-colour glossy on drool-proof paper very well).

      [1] What follows is just my guessing; the AVX10 Version 2 release is going to be *so* exciting|

    2. captain veg Silver badge

      Re: 1984 all over again

      Slightly more recently, Itanic had 128 general purpose registers.

      Turned out that nobody wanted to rewrite their software to take advantage of them.

      -A.

    3. J.G.Harston Silver badge

      Re: 1984 all over again

      32 registers needs a 5-bit field. 16 registers needs a 4-bit field. "4" is a more "natural" number for manipulating bits - says the 8-register PDP11 programmer :)

      An issue is how you fit those bits into the instruction bitfield. The Z80 doubles the number of registers compared to the 8080, but it did it by adding two "swap" instructions, so only the original half was visible at any one time. That was due to the instruction bitfields already being "full". Eg, a ULA operation is 10:ula:reg. So, 8 registers, no more bits to specify any more registers, you can't do eg ADD barry,other_barry, you have to do ld temp,barry, swap, add temp,barry.

      The i86 instruction set will have bitfields something like XXXXXrrrr allowing you to specifiy 16 registers. To specify 32 registers you need another bit. How they support 32 registers depends on if there are any unused bits in the bitfied that the register number can expand into, and - importantly - whether existing code has consistantly set the "unused" bits to zero so the new CPU doesn't inadvertantly access register 16+R instead of R.

      Another method is to use whole sets of unused instructions for 32-reg instructions, the original instructions can only access registers 0-15, to access regieters 0-31 you have to use different instructions, or prefixes (which essentially is just a longer instruction). That's the way the eZ80 and Rabbit went.

  9. CowHorseFrog Silver badge

    How many bytes do these new instructions require ? Surely having 10 byte sequences just to encode new register references cant be good for performance and memory usage ?

    Isnt it time for a new cpu isa which stops the need for this terribly inefficient encodings ?

    1. Crypto Monad Silver badge

      > Surely having 10 byte sequences just to encode new register references cant be good for performance and memory usage ?

      It's an arguable point, but these days by far the slowest part is fetching data from DRAM, and that's most efficiently done by long burst reads into cache. Once the cache is warm, generally that's where the code will be executing from.

      Some instructions being "long" are offset by common instructions being "short", and by the fact that you'd need to use multiple "short" instructions to achieve the same result as one "long" instruction.

      > Isnt it time for a new cpu isa which stops the need for this terribly inefficient encodings ?

      Well, ARM, MIPS and RISC-V are there for you to use. ARM seems to be doing pretty well at the moment: MacOS, Linux, Windows all run natively on it. But if ARM tighten the licensing screws too much, RISC-V could take off in its place.

      Intel tried with Itanium, but failed.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like