Action at the eleventh hour
Funny how it took Intel a monstrous crash in dominance and certainty to try to make life better for the customer. Shove it and keep failing, let's see some progress from a different player.
This week, Intel and AMD set their decades-old rivalry aside to ensure x86 remains relevant amid growing adoption of competing architectures. The formation of the x86 advisory group, announced at OCP in San Jose, California, is a long time coming and frankly, should have happened years ago. The group, which includes folks …
Many would like to see Intel suffer, but remember that AMD is growing unchecked and their arrogance in dealing with customers is also rising at the same rate. They have made multiple decisions in the last two years putting their customers second to profit.
I have no love for Intel, but recognize that without meaningful competition AMD will just become another monster corporation in its place. Competition benefits the customer. Without it you have a monopoly which never ends well.
You seem to be ignoring both the article and market data. While x86 still dominates on PCs and servers, GPUs are the ones with all the profits and ARM for server is finally starting to take off. In this environment, neither Intel nor AMD will be able to dominate the market, which, nVidia aside, is increasingly starting to look like a cartel of vertically integrated providers: AWS, Google, Microsoft, etc.
I've got a current project which requires 40+ cores because everything has been virtualised to a separate "microservice". Fortunately, we've got some time to investigate before we decide what to go with, but from what I understand so far, memory and bandwidth are more important that CPU oomph so that some kind of oodle-core ARM setup with ≥ 256 GB would probably make sense than the Xeons we're currently borrowing or the AMDs which I'd want if we decide to stay with x86 – it may be too early for that kind of migration at the moment, but I'm sure it's going to become more common.
Now that's capitalism.
Oh, we have a competing architecture that is successful. Let's join together to try and ensure that our failing architecture stays relevant.
Never mind that the computing landscape is changing, we have shareholders to satisfy.
Gosh. Talk about dinosaurs.
The future is ARM. Shut up and get with the program.
They can always license the ARM architecture and get into the game at any time - for relatively little investment. They already have customer relationships in place to punt ARM gear if they wish to go that route. There are still big margins in x86 space - cutting down the ISA to size makes a lot of sense (especially the myriad of weird security & system management modes) from the point of view of being able to validate designs & implementations cost-effectively.
ARM implementations have the big advantage of being (relatively) trivial to implement and validate - case in point Fujitsu's A64fx - which represented a huge technical leap forward (HBM memory, hi-speed interconnect, variable length vector support) - but was implemented at a fraction of the budget that AMD & Intel lavish on their latest incremental iteration of x86/AMD64. When it came to shipping the hardware they already had a software eco-system ready to go as well. They could *not* have done that with x86/AMD64 at the time.
> They can always license the ARM architecture and get into the game at any time
AMD already has an ARM license. They were developing an ARM CPU, K12, , but due to their precarious financial position in 2014, when they basically went all-in on developing Zen, putting everything else on the backburner including the K12, it disappeared from their roadmaps.
1) Intel already has an ARM licencse, before Windows inside; sold arm cpu.
2) Why is x86 so bloated, well maybe ask a company called Intel. The various extensions, incompaible with each other… but using 64bit extension licensed from AMD.
Oh, let me guess is there any supported OS which can run 8 or 16bit applications (no), but why is validation so complicated?
Fact is, Windows 10 broke all the 16bit code which I had written. Exe format and APIs removed as well, so re-compile not possible in most cases.
Sure you can implement anything with a single instruction and a lookup table... That doesn't make it easy to produce something that you can depend on.
The *validation* is a huge part of chip design & implementation...
Try coming up with a validation suite for System Management Mode based on Intel's architecture reference manuals and see how far you get. I'll wager that not even Intel they can't come up with a single consistent interpretation of their own spec as stated in their published literature... And that's just one mode - without even considering all the ways it interacts with any other of the many modes - and the whacky instructions such as "lock"... Translating that stuff into an alternate instruction stream doesn't actually help you validate it at all.
While that is true, the bigger question is...
"Where is the software stupid?"
Apple provided Rosetta and bingo your X86 images run on M1/.M2/M3/.M4 etc hardware. Pretty seamless IMHO.
Do MS want to bother? They want you to run everything in the cloud so my guess is Nah.... We'll pass.
Linux already has at least a dozen distro's with ARM distributions. Come on Intel/AMD give MS the ultimatum.
MS has tried other platforms before: remember NT was available on Alpha & Itanium before they threw it in and fell back to Intel/AMD. More recently their spectacular failure to promote their ARM-based Surface slabs. The 'long tail' of Windows apps that ran poorly if at all in emulation has pretty well put paid to that idea for the general market.
Now their Cunning Plan (apologies to Rowan Atkinson) is to get you to run everything in their cloud and just reach it from a thin client/browser on your tablet/handheld. That the majority of Azure VMs are in fact running on some Linux flavor is irrelevant as they're getting paid for the cycles regardless of what OS is burning through them.
They can always license the ARM architecture and get into the game at any time.
Intel already tried that (and failed). They acquired the StrongARM CPUs from DEC and then replaced them with the Intel XScale (ARMv5TE) family of processors. [see https://en.wikipedia.org/wiki/XScale]
Granted, they were aimed more at the embedded SoC market than the server market. They were actually a pretty decent family of parts, but were a bit on the expensive side compared to other competing ARM SoC parts. One nice thing about them is that (IIRC) they were big-endian so network-intensive stuff ran a little more efficiently.
The future is actually 128-Bits Wide with a COMBINED CPU/GPU/DSP/Vector Array Processor ALL Built-into one single 3D-layred super-chip that has MANY TENS of TERABYTES of on-board fast DRAM for 1024 cores EACH with 8 hardware-interruptible threads (aka a form of Hyperthreading) for 8192 available processing threads!
We just happen to HAVE that chip all ready to go for download as a world-wide fully free and open-source tape-out design under GPL-3 licence terms! When 3D printed onto Borosilicate Glass bases with copper tracing you can get 60 Ghz processors DIY 3D printed at home (575 TeraFLOPS sustained at 128-bits wide!) and when using GaAs on micro-channel-cooled Borosilicate glass at a proper FAB, you can get as fast as 2 THz! (50 PetaFLOPS sustained at 128-bits wide)
We also have a 128-bits wide operating system READY TO GO that has all the major apps needed to run a small, medium and large business that works on these Super-Chips but if you want you can STILL RUN all versions of DOS, Unix, Windows (Win 3.1 to Win-95 to Win-98 to Win-2000 to XP - Win7/8/10/11 and all Servers from WinServer-3 to WinServ 2000 to Server 2022) plus Linux and Android within MULTIPLE 16-bit, 32-bit and 64-bit Securely Partitioned Sandboxes and run them ALL AT THE SAME TIME on a single super-chip to such an extent that you can run a NESTED Windows or Linux Hypervisor AND ALL of its client server OS'es and end-user OS'es WITHIN a separate sand-box!
I think it's NOW TIME to UTTERLY DESTROY absolutely ALL of the major chip designers/manufacturers: Intel, AMD, ARM, IBM, Qualcomm, Apple, Microsoft, NVIDIA, NEC, etc.
128-bits Super-Chips and 128-bit Wide Operating Systems ARE HERE NOW !!!!
V
Done! The North Canadian Aerospace website is going LIVE in a few weeks! An entire tape-out design is being readied for DIY 3D printing/CNC-machining on flat panels of Borosilicate glass which cost under $50 USD for a 50 cm by 50 cm at 8 mm thick plate. The entire design was PERFECTED for home-based/small-office printing so that people could MAKE THEIR OWN CHIPS AT HOME! I'm running a few of my super-chips right now at 60 GHz for 575 TeraFLOPS SUSTAINED at 128-bits wide. 1024 cores with 8-way hardware-interrupt-based hyperthreading per core so 8192 threads are available for your grid-enabled applications.
We custom-designed an INEXPENSIVE METHOD to make SUPERIOR TRANSISTORS that have ultra-high-switching speeds (i.e. less than one nanosecond into the mid-picoseconds range!), ultra high resistance to overvoltage destruction so that superior performance within high-levels of EMI/RFI interference and high-radiation environments are supported. High frequencies starting at 1 GHz up to 10 THz are supported for ultra-high-speed clock rates AND this system can work at high temperatures up to 400 Celcius! This means we don't have to worry about cooling so much! The transistor technology is called Layered Intermetallic Ceramic Field Effect Transistor (LICFET -- Yes! We did the that acronym on purpose!) and can be created via 3D-printed powder deposition plus laser sintering means.
The line traces have to be 200 nm and above BUT that is AOK, since we are printing on very large Borosilicate glass plates using a 3D layered approach which supports our multi-BILLION-count number of transistors. We also figures out an inexpensive way to CREATE actual 200 nanometre line traces using home-based DIY 3D printing which took a LOT of supercomputing time to model and put into real-world practice. The line-trace creation technique will be explained in the documentation release IN FULL so that people can properly understand HOW a home-based DIY 3D printer and common ceramic and metallic powders of a micron and less in size can get down to actually creating 200 nanometre line traces.
The 128-bits wide operating system is ALSO READY TO GO!
AND.... this is NOT SARCASM or joking around! This is a VERY REAL project that has been running for OVER TWENTY YEARS in our Vancouver, Canada-based laboratories! We are a large under-the-radar Aerospace company with SIGNIFICANT technical resources and multi-science-field expertise. It is NOW in 2024 where we are doing a full world-wide fully free and opens source under GPL-3 licence terms public disclosure! We are READY TO GO!
V
Why is it going live in "a few weeks" and not immediately? Surely a company of your immense and undoubted technical ability could spin up such a basic thing as a website in under 10 minutes using the huge computing resources at your disposal? It almost sounds like you're kicking the can down the road (again!) and there's no reason for you to do that, is there?
We also have a 128-bits wide operating system READY TO GO that has all the major apps needed to run a small, medium and large business that works on these Super-Chips but if you want you can STILL RUN all versions of DOS, Unix, Windows (Win 3.1 to Win-95 to Win-98 to Win-2000 to XP - Win7/8/10/11 and all Servers from WinServer-3 to WinServ 2000 to Server 2022) plus Linux and Android within MULTIPLE 16-bit, 32-bit and 64-bit Securely Partitioned Sandboxes and run them ALL AT THE SAME TIME on a single super-chip to such an extent that you can run a NESTED Windows or Linux Hypervisor AND ALL of its client server OS'es and end-user OS'es WITHIN a separate sand-box!
But, can it play Crystalis?
I think the future is ARM for those that are vertically integrated, for the rest, not so much at least in the server space. Very few can afford to be vertically integrated to that level, perhaps at this point that is mostly companies with trillion dollar market caps for the most part. I wouldn't be surprised if in a decade or so RISC-V takes over from ARM as these vertically integrated companies go to take another layer out of the supply chain.
I assume you're talking about:
"Ultimately, the union boils down to this: the last thing anyone needs or wants to deal with is compatibility edge cases because Intel or AMD decided their implementation of the x86 ISA was better than the others'."
In which case, yes, there are multiple others. The only two that matter most of the time are those, and that's the other they're most likely looking at, but you do have Zhaoxin/Via as well. Most of the ones I've seen have been lackluster, but as AMD64-compatible chips, they could come up with interesting implementations. Technically, there are a few others, but they mostly make 32-bit ones for embedded, so they're probably not doing anything that new. Still, "others".
16bit is already effectively gone.
It also doesn't matter, because emulators are fast enough - too fast, in many cases!
32bit is another thing entirely.
There's still a huge amount of 32bit x86 Windows and Linux software that's unlikely to have a 64bit version or replacement any time soon.
But again, emulators are rather fast.
If you read other news, what intel is proposing is making a processor that boots right into 64Bit Compatibility mode with 32 bit support. No option to boot in 16bit real mode, remove every single trace of 16bit support (both 16bit real mode and 16bit protected mode) from the silicon and also remove a lot of unused 32bit stuff that is not used anymore (like the Virtual x86 mode that Windowx 3.x and (x used to run DOS programs).
Well behaved 32 bit SW (Like Win32.exe and up, and most of 32 bit *nix SW) shall be fine but UNDER A 64Bit OS, that will be the only type of OS those chips will run...
V86 Mode Extension support has been broken in AMD processors since Zen 1. If I try to boot Windows 9x or ME in a hypervisor on my Zen 3 processor, it'll throw a slew of errors unless I use patched versions of the OS. So folks like me have already turned to emulation to run old apps that refuse to work on any other OS.
This is something Intel and AMD should have started doing 10 years ago. The architecture is an absolute train wreck of optional features.
To give an example, each CPU has a list of feature flags that your application should supposedly look at to see if it has that feature before deciding whether or not it can use it. I just had a look at the CPU that I'm writing this post on, and there are 174 of them. 174.
I am trying to imagine a world in which any software developer is going to examine each PC it is running on and decide which of 174 different alternate implementations to use. Realistically, you are going to look at most for one or two features that could make a big difference and ignore the rest, which may as well not be there.
You are also going to ignore new features until they have been around long enough that the hardware that supports them is the vast majority of the installed base, or at least the majority for whatever your market is.
If you are using say SIMD, that means targeting SSE4.2 and providing a non-SIMD fall-back. AVX was actually slower than SSE4.2 on at least some CPUs, (good luck figuring out which ones), AVX512 is still on only a small share of hardware, and Intel have focused on slicing the x86 market into ever thinner segments of inconsistent feature sets in order to try to get the maximum revenue from each segment.
The idea that x86 offers great backwards compatibility doesn't stand up to scrutiny. For example, I have an old, functioning, PC that won't boot a modern mainstream 32 bit Linux distro (of the few 32 bit ones that remain) because it doesn't have the "popcnt" instruction. For those not familiar with it, popcnt counts the number of bits set in a word. Even software that is as widely used as the Linux kernel cannot be bothered to work around all the myriad different x86 feature sets, they just pick a few and go with it and if there is hardware still out there that doesn't work with, then too bad.
The question today is whether Intel and AMD have left things too late. Other architectures without all that baggage may be able to make better use of their available space to offer better performance, lower power consumption, lower price, or some combination of the three to make life difficult for Intel and AMD.
But this is the job of the compiler. (hand rolled assembler excluded, though I've not seen any that uses newer features except in very niche areas)
Incompatibilities are due to compile-time options.
source-build for the appropriate CPU, and you should be fine.
One of my servers is ancient (ivybridge), yet runs most binaries because most are pre-compiled using the lowest common denominator of features. Ones that don't run, I just compile from source.
Equally, my newer servers compile with all the extra features the compiler supports.
Compiling from source used to be one of the main advantages of Open source, but these days, is becoming a lost art (I realise that llvm and rust compile times are a factor!)
My last place used the Intel compiler suite for the longest time because it supposedly did generate optional binary code for different processor variants. Only problem was the firm refreshed with AMD based servers shortly afterwards... We eventually wrestled the code base back onto gcc.
But this is the job of the compiler. (hand rolled assembler excluded, though I've not seen any that uses newer features except in very niche areas)
Incompatibilities are due to compile-time options.
That's true, but not everyone has the option of compiling from source. A commercial product might be able to ship a couple of pre-compiled binaries, but won't have 174 of them.
All too often you need to target compiler options for performance, especially for things like a compute-intensive kernel. Compiling for the lowest common denominator of a processor may mean passing on considerable performance gains from more complete/recent CPUs.
Some older CPUs (uVAX comes to mind) handled this by having non-implemented instructions trigger a machine trap, the trap handler could then emulate the instruction from its "bigger brother" in software and return to the main flow. Often gave poorer performance, but at least things would run. I don't know if any of the x86 processors have an equivalent.
Unimplemented instruction traps going off to software implementations can have interesting side-effects.
First, an aside. The actual O/S calls on Dec10 (if not earlier) were implemented via UUOs. Bu I digress.
In the late 70's/early 80's, the research group I worked for started their Dec10 replacement plan. Put in a VAX 11/780 with the Floating Point hardware. Soon after it was installed, I left.
Then it was time for another system. Was it to be a VAX or an IBM? Now, Rutherford Lab having an IBM was quite the draw in that direction as it made data exchange easier, FP formats not being quite standardised as yet. So, benchmarks were run (you know where this is going now, don't you?)
The IBM 370/whatever that was being contemplated won, so it was ordered. Some time (months??) later during maintenance, it was discovered that the VAX FP unit wasn't functioning, and the FP instruction set was being transparently emulated. Fixed it, the VAX now out-performed the IBM. Red faces all round.......
The compiler won't in most cases use a lot of these optional features because there is often no way of expressing them in a generic portable high level language such as C. Your options are generally either assembly language or compiler extensions (which allow you to specify assembly language instructions in the middle of a C program).
In some cases the compiler will automatically use some features if present, but generally only for very simple cases. In many cases you need to actually use a different algorithm to be able to use the instructions effectively. That isn't something that compilers are good at figuring out for themselves. You could easily need to use one algorithm for floating point, another algorithm for unsigned integer, a third algorithm for signed integer, and a fourth algorithm for non-optimized generic C for fall-back mode. Been there, done that.
I have an open source C project where I support Linux, Windows, and BSD. Windows users get a pre-compiled binary without optimizations because many Windows users don't have a compiler installed and asking them to install one is asking a bit too much.
Linux and BSD users get a source distribution only because unlike Windows installing a compiler there is very easy. The Linux build script looks at the CPU flags to see if it has the optimized features. If so, it compiles for these (there are a bunch of if-defs in the code). If the flags are not present, it uses generic fall-back mode.
BSD users also get the generic fall back mode because while Clang LLVM is reasonably GCC compatible, one of its weak areas is poor support for compiler extensions. They try to make up for it with automatically using the instructions if present. This is good when it works, but usually it doesn't because as I said, you actually need to write a different algorithm in many cases, one that can't even be expressed in standard C (which is a lowest common denominator language, for the sake of portability).
Now take all of these different versions on different platforms with different compilers, and realize that each of these has to be not just written and maintained, but also tested. You need to have a very thorough automated testing system for each of these combinations.
ARM has been a different story. I have 32 and 64 bit versions, but as an application (as opposed to OS) writer, ARM has been a pleasure to work with compared to x86. OS initialization has been more difficult on ARM (due to lack of boot standards on different boards) but that isn't something that has affected me as an application level developer.
x86 CPUs generally have a consistent board support definition / ISA.
ARM is just an IP core. The products made with them are not compatible with each other. iPhones, Android phones, M1... series Max, Raspberry Pis etc are NOT compatible with each other. Buy an x86 board and you can run any OS built for it, Windows, Linux or BSD with very few hassle.
Anyone who has had to develop for multiple ARM platforms will tell you this and wistfully yearn for x86.
"The idea that x86 offers great backwards compatibility doesn't stand up to scrutiny. For example, I have an old, functioning, PC that won't boot a modern mainstream 32 bit Linux distro"
What you're complaining about is a lack of forwards compatibility. Your pre-SSE4.2 cpu cannot run software compiled with instructions from the SSE4.2, or newer, instruction set. That's a "well, duh" moment. Obviously an old CPU cannot natively execute instructions which didn't exist when that CPU was created. The same is true for ARM and MIPs.
Backwards compatibility is great - a modern x86-64 machine can run binaries compiled prior to the release of SSE4.2.
174 feature flags is a bit misleading, though:
- a large fraction of them are only of concern to an operating system kernel, are already handled by existing kernels, and that handling is encapsulated so that end users will never know a thing (except possibly some resultant performance or capability difference)
- many are for extremely obscure or forgotten features which never caught on and simply won't be of interest
- most of the rest are captured as large groups under the headings of 'x86_64 v1' through 'v4', as overarching compile-to targets available to OS distributors, software designers and builders
Somewhat orthogonally, other groups of such flag-related decisions are encapsulated into various libraries. If you need to do some sort of SIMD math, you can write your own x86 assembly using SSE, AVX, AVX-512, etc. -- or you can use one of many many libraries which encapsulates the conceptual operations. The library 'knows' about the various CPU instruction families (and in some cases, also, how to outsource this to a GPU, NPU, or whatever newfangled whatsit is available); knows how to spot-test for presence and then spot-benchmark for performance, and therefore, how to use whatever's available, to best effect.
So yes, all that throbbing complexity exists in the background; and you *can* engage with it if you really insist on it; but there's also a huge existing software ecosystem which hides most of the complexity, should you choose to go that route.
And as mentioned elsewhere in this discussion, ARM is certainly not free of such things. There are lots of generations of the ARM Instruction Set Architecture, each of which has plenty of optional design-in extensions. If you told me that the overall level of complexity is lower -- using a headline figure like 'there are only 67 architectural flags to be aware of' (number pulled out of thin air) -- I would believe you. It's still a level of complexity which demands support in the OS kernel, compiler, libraries, etc. Overall it might be a little bit smoother, but this is really a minor quantitative difference, not qualitative.
There is crossover of ownership of these companies, no?
So in their interest is to consolidate, to save their money. Probably even bigger money maker would be ISA licensing, like ARM does.
Though it is a bit late.
In the end users don't care which ISA their CPU runs. Whether on ARM or X86, Chrome looks exactly the same.
Intel have a good RISC implementation of the x86_64 code right?
And AMD has their chiplet technology. I have been for nearly 10 years or so waiting for AMD to do a CPU GPU GPGPU DPU DSP all in one package. And release it for the desktop mobile and devices.
And I'm still waiting.
Getting rid of X86 won't happen. Just stop with this ARM pipedream. You're making a fool of yourself. RISC CPUs simply do NOT have the same processing power and capabilities that CISC CPU do.
There's a reason they call x86 a CISC and it's because it can do complex integer set computing. It can perform complex mathematical and algebraic equations and calculations to run programs and applications with extreme precision and efficiency.
RISC is called Reduced Instruction Set Computing for a reason. It's NOT meant to process high level mathematics or operate with efficiency. It's a low power part for basic instruction sets for small form factor devices.
RISC has been attempted many times and every time it has tried to even begin to replace x86, it's been a complete failure. The Snapdragon X still can not emulate x86 within 65% efficiency in CPU instructions per cycle. Even the Just-In-Time Recompilers used by LLVM/Clang can run RISC code at 75%+ efficiency, and RPCS3 is a living statement to that.
X86 can emulate ARM at 75%+ if you look at BlueStacks.
SPARC is dead, IA64 is dead, PowerPC is barely relevant, the Motorola 68K is dead. The Ricoh 65c816/6502 is dead. Every attempt has failed. Because it simply can not be done.
Not entirely accurate
CISC - complex instruction set
This means you can issue a "complex" instruction to the CPU in one-hit such as "multiply these two 8x8 matrices together". The CPU then breaks this down into the multiple discrete mathematical operations required
RISC = Reduced Instruction Set
The CPU only supports more simple operations such as "add these 2 numbers" or "multiply these 2 numbers", meaning that it is the job of the compiler to break down your complex instruction into the multiple simple operations
SPARC (which you name drop) was used to run Solaris serves which during the dot-com bubble (yes I am that old) made up a significant chunk of all the servers powering large websites and Enterprise workloads such as Oracle / DB2
Those complex instructions which as you say are broken down into discrete operations can be interrupted by a kernel call. When the interrupt is over, the context of the complex instruction has more than likely been lost and the CPU has to start it all again.
Many, many years ago and in a former life, I did some benchmarking of the VAX POLY instruction. The result was that for most operations it was faster to code the thing in core instructions than use the POLY instruction. While things might have changed, the CPU designers are faced with how to handle long running instructions and interrupts.
Reaper X7 wrote:
"...There's a reason they call x86 a CISC and it's because it can do complex integer set computing. It can perform complex mathematical and algebraic equations and calculations to run programs and applications with extreme precision and efficiency. RISC is called Reduced Instruction Set Computing for a reason. It's NOT meant to process high level mathematics or operate with efficiency. It's a low power part for basic instruction sets for small form factor devices..."
Is this some kind of joke? RISC vs CISC has nothing to do with performing "complex mathematical and algebraic equations and calculations to run programs and applications with extreme precision and efficiency." or "high level mathematics". If it was, we'd still be using the original VAX-11/780 ISA, with polynomials in hardware.
And I don't recall IBM Regatta (Power), Sun Starfire (SPARC) or H-P Superdome (PA-RISC) being examples of "small form factor devices"
"There's a reason they call x86 a CISC and it's because it can do complex integer set computing."
No. That's really not the reason.
Firstly the term "CISC" only came into being as a counterpoint to "RISC".
The point of CISC (before that term was coined) was to minimize the cost of fetching an instruction from (very slow) main memory. It was a hunch that was not actually based on any empirical evidence - and the instructions chosen to be implemented were primarily driven by marketing and implemented by a mixture of microcode and kludgy expansion modules. The emphasis was on making writing assembly & machine code easier - the VAX-11 had stuff like string search and replace for example. RISC changed the focus to making the compiler writer's life easier (I always found writing RISC assembler easier than CISC too for that matter - far fewer corner cases to worry about).
Reduced Instruction Set Computers were designed for efficient implementation of hardware and software (ie: compilers). The first machine designed this way was the 801, the team was led by John Cocke. That guy worked on the IBM S/360 - the original multi-billion-dollar development budget CISC machine, and that work showed him that compilers were not using the instruction set efficiently. So for the 801 project he set out to develop an ISA with the goal of keeping the hardware *AND* the compilers simple and efficient. He achieved that goal - to the point where his tiny 801 processor + compiler outperformed the bigger (and more expensive) CISC S/360 contemporaries. The prize for his success was for IBM to hide the 801 in channel processors and other ancillary gear that wouldn't erode their humongous margins on mainframes.
Even when CISC was king (and IBM were raking in the cash from S/360 and it's descendants) the fastest and most efficient machines of the era were rather RISCy: case in point the CDC6600 and early CRAYs - which were an order of magnitude faster and more energy efficient than their CISC (IBM) contemporaries.
These days all high end CISC chips are implemented as a "front end" instruction decoder feeding a bunch of RISC-style "micro-ops" to a very RISC/VLIW style backend. There aren't any true CISC chips left - they are all RISC style back-ends with a CISC style front end on them. This entire argument is over marketing really - the engineering battle was lost to RISC in the 90s whether folks want to accept it or not. :)
From my PoV, engineering wise, the last great (pure?) CISC was the P6 core - which first appeared as the Pentium Pro in the mid 90s. Everything after that has either sucked (eg: Pentium IV) or had a CISC front end grafted onto a RISC/VLIW backend.
All the super-intense compute stuff is done with GPUs today - which consist of many RISC style cores operating in parallel. Intel tried a CISC alternative to that with Xeon Phi - suffice to say it sank without trace (because it was sooooo slow - and inefficient).
Going forward I suspect the future will look like Fujitsu's A64fx - RISC cores with a monster vector unit and a large helping of HBM memory on the side... Just like what NVidia are punting right now in fact. ;)
My memory of limited experience with MIPS assembly on R4000s long ago is that I really didn't like having to figure out what I could do to fill the time while the processor did something that my next instruction depended on. I'd end up putting NOPs in at first and then, when everything was working, shuffle things around. That sort of thing and "Relegate Impossible Stuff to the Compiler" circulating as an alternate meaning for "RISC" are fun memories.
I couldn't agree more on the Xeon Phi. Spent $5k on a Knight Corner board to end up wondering what was I thinking.
I confess that I loath Intel for what they've done and their malign influences on the industry - but I did have hopes for Phi before the details of their design started to appear... When the details appeared - such as it being organized around a ring bus and they were absolutely hell bent on using a Pentium core I figured they had screwed the pooch - but maybe they'd get away with it through them magic of their processes & fabs... They didn't get away with it.
RISC is really about the approach you take to designing the ISA and implementing it - specifically using real-world metrics to direct your design and implementation - rather than "feels" and "vibes". I chose to contrast System 360 vs CDC6600 because (even though they were shipped before RISC was a thing) they reflect the contrasting design philosophies embodied by CISC & RISC. IBM's T.J.Watson Jr.'s rage-post from 1963 (https://images.computerhistory.org/revonline/images/500004285-03-01.jpg) illustrates the advantages of applying RISC style principles (CDC 6600) vs CISC ISA approach (IBM System 370) in terms of performance and development cost.
As an aside it's worth studying the CDC 6600 and the System 360's development - plenty of interesting stuff there and lessons to be learnt (and you can see those lessons being applied - or not - in subsequent generations of machines too). :)
I should correct myself... The P6 core was in fact the first Intel core to have the CISC front end / RISC (micro-op) back-end approach.... I still like it, but it would be more correct to say it was the first "great" CISC frontend/RISC backend rather than the last great CISC design. :)
It would take several pages to explain in detail all the different ways where you are utterly wrong. But in summary :
- I really do not know where to start with the notion that only CISC CPUs can perform "complex mathematical and algebraic operations" with "extreme precision and efficiency". SPARC, MIPS and PowerPC have lengthy track record here, being used for CGI in films, by the oil & gas industry, financial services sector etc etc.
- A RISC is not inherently a "low power part for small form factor devices". The earliest RISC CPUs were used to build servers, workstations and mainframes. IBM dominated enterprise computing with the RS/6000 workstation, and its S/390 CPUs were a CISC ISA running on a version of its POWER ISA RISC platform.
- CISC does not mean "able to do complicated things". CISC means "I have a complicated instruction set whose instructions may take several clock cycles to execute and which you may never use".
- I have no idea why you think the inability to emulate another instruction set at full speed rules out an architecture as being viable.
- the Motorola 68K and Itanium are not RISC architectures. 68K is "dead" because it can't run Windows, and Itanium was simply a poor design.
I remember life 25-30 years ago. Nobody in their right mind would have deployed x86 in the enterprise server space, it simply was not done. Every RISC CPU wiped the floor with x86 at the time. They lost because x86 was cheaper and could run Windows, and Intel were eventually able to hotrod their rubbish architecture to make it run fast.
These days, the CISC vs RISC thing does not matter. It was important in the 1980s/90s when chip real estate was at a premium, and RISC could use the space vacated by complex instructions to make simple instructions run much faster. Nowadays, everything including x86 is implemented on a RISC core with the higher level CISC instructions microcoded.
I mean, sure as someone who runs Linux or *BSD on a server, it doesn't really matter if you have 16 and 32 bit support... but then again, it doesn't really matter if you have x86 or ARM or whatever.
The area where this doesn't matter is the Windows-Server market. There software is distributed without source code... and backwards compatibility is vital. The portion of that market that can't just switch to normal operating systems is "application servers for legacy applications". That old business-critical 16 bit WinAPI Application the company bought in 1993 will certainly not run on ARM.