"there's far more interesting stuff happening in tech that matters to everyone"
Maybe actually write an article about that?
Apple Silicon has been the autumn’s hottest news in cool chips. Giving Intel two years’ notice, the first laptop and desktop with the new Arm-based M1 chip have shipped and the benchmarks run. If you believe some of the more febrile headlines, Apple has upended the industry, sparked a revolution and changed the face of computing …
He just that article, look at this:
"The real problems - the interesting problems - in computing are never solved by an SoC. The real problems - the interesting problems - are moving data, not doing sums in a CPU."
See? I know, it's Shocking, isn't it.
He is not talking about moving data to the GPU. Moving data is meant between computers and/or data storage. For example having the fastest CPU in the world will not give you faster Netflix streaming, network DB access, better looking TikTok videos etc. It might reduce your wait to render said video, but uploading it will still take the same time as it would on a 10 year old Core Duo.
I agree that "moving data" can refer to many different use cases. But one of those use cases is moving data between the CPU and GPU. Apple silicon provides a very impressive way of dealing with that specific use case. The author of the article dismissed Apple silicon as being uninteresting because he claimed it offered no improvement for moving data, yet he completely ignored its impressive moving-data accomplishment. This suggests that (independent of the merits or otherwise of Apple silicon) the article was not well researched/written.
This is actually mentioned up in the article, in one of the top paragraphs.
(To paraphrase your comment: this suggests that the article was not read / understood) - I actually don't want it to sound so nasty, but it is pretty much the same sentence as above.
Might not make Netflix any faster, but it's not just the CPU speed thats impressive. Its the up to 20 hour battery life in the 13" pro, or the fact the you just never hear fans on these things
On my desk right now i've got 3 laptops, 2 Mac's 1 Windows. One Mac is a 16" Pro with an i9 Intel and the Windows is an i5. I do not have to do much on either of these to make the fans kick in and the battery drain quicker.
The other Mac is a 13" M1 Pro. Everything seems to open extremely fast, things just feel instant, and as hard as i have tried the battery just will not drain fast, and i still have not heard the fans. I was even playing Cities Skylines on this thing, smoothly on an iGPU no fans blasting and it was not even an ARM app it was a translated Intel App. My i9 16" Pro even with dGPU and running natively goes bonkers on fans with this game.
The Apple Silicon is exciting, because it absolutely obliterates its competition, this M1 is the Macbook Air chip really, which would normally have a Y Series intel in it. It not only wipes the floor with the Y series, it can put up a good fight against processors not even it its class, all while not sucking up much power. What happens when Apple releases the bigger, faster M1's for devices that would traditionally had a U or H series processor in them? Then the following year Apple increase the performance of them all by 20 to 30% on M2's? Then 20 to 30% again over that on M3's? Exciting.
I never hear the fans on my deskop either. And, battery life? It doesn't have any.
My ThinkPad only revs up the fan at lunch time, when the AV software does its scan. That spends 99.9% of its time tethered to a docking station, either at work or at home, with large external displays and the dock charges the ThinkPad, so, again, battery life is not relevant to me.
What they have achieved is very impressive, if you need what they have produced. It certainly gives Intel a kick up the butt, but if you are locked into applications that are Windows only, the Apple processors could be twice as fast as their Intel equivalents and it still wouldn't make a difference.
“Might not make Netflix fastet’.
It does not take a genius to migrate the M1 Mac Mini to a new market segment. Convert to a Datacentre oriented blade and develop a Blade chassis and supporting ecosystem to support this.
macOS is already just a pretty face and API set sitting on top of a branched version of Unix ... certified to Unix 03 level.
"or the fact the you just never hear fans on these things"
That's because apple prefers to start throttling the processor before turning the fans on as an absolute last resort. Instead of giving you as much compute power as possible for as long as possible.
Actually he was talking about things like adding memory which would be outside the SoC and thus talking about moving data along the bus negating any advantage of the SoC.
Nothing to do with network traffic which also has more factors that occur outside your PC.
".....having the fastest CPU in the world will not give you faster Netflix streaming, network DB access, better looking TikTok videos etc. It might reduce your wait to render said video, but uploading it will still take the same time as it would on a 10 year old Core Duo. ..."
I think we can solve THAT ISSUE!
Quantum Teleportation aka Spooky Action At A Distance which allows a bunch of Xenon atoms trapped in a quantum well to be read in and written out at full-duplex PETABYTES PER SECOND data rates without being "trapped" by the quantum decoherence issue is one innovation coming from our Vancouver, British Columbia, Canada-based company soon enough.
Since we can print quantum wells into BOTH CMOS and GaAs substrates directly, we can embedded high-speed communications to peripherals and external RAM right onto the processing chips themselves.
Since we have BOTH an in-house designed and built 575 TeraFLOPs per chip 60 GHz GaAs general purpose CPU/GPU/DSP and a TWO THz fully opto-electronic DSP-and-Math-oriented array processor (19.2 PetaFLOPS per chip!), we can (and DO!) embed external access to memory and storage via quantum teleportation means. You only need 8192 bits of trapped Xenon quantum wells to make a PETABYTES per second pipeline and that takes up a bare few square millimetres of on-chip real estate. The quantum decoherence issue caused by reading/writing quantum bits is SOLVED by letting the accumulation of errors BECOME the data transfer mechanism itself!
It's also the world's FASTEST wireless network system which basically has unlimited range (i.e. Quantum Teleportation propagates at 50000x the speed of light!) no matter how far apart the chips are and no matter the in-between terrain!
This will all be coming out soon enough for public sale along with a few of our other inventions such as our ultra high-end non-aerodynamic principles-based aerospace propulsion system. The fellow Black Budget World has had quite a few computing tricks up its sleeve for quite a while and VERY SOON NOW, we will come up from under-the-radar and go all public sale and disclosure with our in-house products.
Other companies in similar vein to ours ALSO have nearly the same types of products we do and they too are going "White Budget World" soon enough because of us. We'll see who shows their public cards FIRST!
YES! It can play Crysis --- That was ONE OF OUR VERY FIRST TESTS by the way!
The RISC chips emulate ENTIRELY IN SOFTWARE, the entire x86 instruction set in real time and we can run Crysis at max-everything settings at ridiculous frame rates. The internal bandwidth tests we just stopped at 10,000 fps at 16,384 by 8640 at 64-bit RGBA colour resolution (we modded the game internally to create new resolution and FPS limits!)
The super-cpu-chips are normally used for aerospace applications and are used in our "Infinite-Phyre" supercomputing system which is the REAL WORLD'S FASTEST SUPERCOMPUTER at 119 ExaFLOPS sustained using 128-bit floating point operations.
The yield and cost of production on the actual GaAs chips in 2020 is now at such great levels that we can sell the 60 GHz, 575 TeraFLOP 128-bits wide Combined CPU/GPU/DSP chip with onboard 3D stacked Petabyte Non-Volatile RAM memory for less than $3000 USD and STILL make a fantastic profit.
These chips are ENTIRELY "Designed and Made In Canada by Canadians" to ensure ITAR-free sales capability so EVERYONE in the world will be able to buy them. An original holding company in a foreign non-treaty aligned country (i.e. being the owner who subcontracted us to design and build) originated and holds all the Intellectual Property so we can export finished chips everywhere in the world without issue! We will OPEN-SOURCE the layout and tape-out files (i.e. the chips designs) and let ANYONE produce the chips if they have the expertise to do so. No royalties needed! We will then produce our own chips on our own lines for sale worldwide and competition is certainly welcome!
THIS TIME, no-one will be able to hog the technology and/or fortune as we WILL be giving away the IP away FREE AND OPEN SOURCE!
The supercomputing systems which we are running in both Vancouver and Northern British Columbia are running a WBE (Whole Brain Emulation) which simulates all the Sodium, Potassium and Phosphorous electro-chemical gating done in human neural tissue at a VERY HIGH FIDELITY.
We basically digitally "Grew" a human brain in a computer and "trained" it like we would a child 24/7/365 with synthetic inputs including vision, auditory, physically sensory (i.e. touch) and let it learn by itself. We even instilled EQ (Emotional Intelligence) to simulate various human emotional traits
including empathy, sympathy, cooperativeness, etc.
We estimate it's current IQ at about 160 Human equivalent which makes it a super-intelligence. Then we put it to work on basic physics, quantum mechanics and chromodynamics, materials engineering, electrical power production systems, aerospace propulsion and medical systems research and development at NOBEL Laureate-levels of inquiry and end-results.
It has profited us GREATLY with new insights and breakthroughs that are STUNNING to say the least. Because of these new breakthroughs, we can now afford to introduce new products and systems that will pretty much OBSOLETE EVERYTHING already out there!
Most of the major stuff we will give away FOR FREE AS OPEN SOURCE designs and instructions.
The medical and scientific-oriented "Star Trek Tricorder" device is now coming sooner rather than later!
Coming soon to an online and real-world store near you!
P.S. Lithium-Ion and Aluminum-Air batteries ARE DEAD IN THE WATER !!! We have something MUCH MORE POWERFUL and much longer-lasting!
And, while Apple is insistent on having Thunderbolt 3 (of Intel fame) and USB4 on their computers for connectivity with the outside world, the rest of the industry is happy with USB3. The bandwidth of these is hugely different.
Granted, for the user they look the same :)
But you still have to get those terabytes of data into that pifflingly small amount of RAM and back out again...
Look at SAP HANA (oh, God, did I actually bring that up as an example? The shame!) That run on huge machines, often with more than a terabyte of RAM. Our "relatively" small servers have replaced the old ones with 128GB RAM, the new ones have 512GB each. They also have SAS SSD-SANs for storage, because what is on the storage and how quickly it can be retrieved into memory and written back out again are almost more important than the actual speed of the processor and the cache RAM.
Our clients still mainly have Core i3 processors and 4GB RAM, that is enough for Outlook and RDP, the "real" work is done on terminal servers and backend servers. And we spend a lot of time fine tuning that environment to get the most out of it. Moving to an SoC with onboard RAM isn't going to be useful, especially if we suddenly need to increase the RAM to cope with new loads.
With the servers, you chuck another couple of ECC DIMMs at the problem. If it was an Apple SoC, you'd have to "throw away" the whole thing and hope they have a bigger SoC that meets your requirements.
Don't get me wrong, I'm very impressed with what Apple have achieved, but in its current form, it is irrelevant to what I do on a daily basis, because I am stuck in a Windows and Linux world that needs heavy weight processors with lots of RAM.
I'll keep an eye on what Apple is doing, and for the user who can find all their software under macOS and will never need more than 16GB RAM, they are a great option. I will be very interested to see what they do for professional level devices and not just entry level devices. I think that is when we will see how this move is really going to pan out.
That's been tried, and it hasn't sold very much. Not really for any defect in ARM; if you throw 96 ARM cores on a single server, you're going to get some pretty good performance if your task can easily run on that many CPUs but couldn't or wasn't ported to GPUs. However, it didn't increase the speeds that the M1 has and for many of the reasons stated in the article. Server ARM chipsets don't have memory inside the SOCs, so they don't get the very fast transfer from and to memory. They are also able to handle more memory because it's kept separate, so everything's a tradeoff.
It really depends what you care about. I do some compute-heavy things on a local machine, so a processor that runs very fast is quite useful. Simultaneously, I don't need a lot of memory for those things, so an M1 with 16 GB of on-chip memory would probably be quite nice, and I'll have to consider it if my current machines need replacement (they don't yet). That said, many of my compute-heavy tasks aren't time sensitive, so although the M1 could probably do them faster, I don't need them to go faster right now. There are others for whom these advantages are less important. I don't really see much benefit in giving Apple's chip designs blanket praise for revolutionizing everybody or dismissing them as unimportant; both views are limited.
Remember that this is the chip for a couple of ultra-portable laptops and a small form factor desktop, not for a server running SAP HANA. Apple pulled out of the server market many years ago, and I'm pretty sure they will have something a lot more powerful for their workstation offering.
Yes, the M1 beats the pants off the 10900K in some workloads, and yes it does fall behind in others, but the 10900K is not competition for the MacBook Air or Mac Mini. The fact that you can make sensible comparisons between the two shows just how much they've done.
"When all you have is a hammer"
There is always more than one way to solve a task. In your example, one way is what you do: throw a "bigger" computer at the problem. Another approach is throw "more computers" at the problem.
The latter is what "supercomputers" do, for the very practical reason you just can't build a "very big" computer that can compete. It is also what "cloud computing" does - it runs on a lot of "smaller" computers.
Until now, the tools they have were power hungry Intel CPUs and let's not give them too much slack, not many years ago an entry level Intel "server" could not have more than 32GB RAM (while a cheaper AMD server you could fit with say 512GB). We also had ARM server chips, trying to emulate what the Intel chips were doing and trying to compete on cost (less profit).
Now Apple has demonstrated that high performance, high-integration SoC can be done.
SoCs are not now. Pretty much all microcontrollers around are of this kind. There are microcontrollers with wildly warying amounts of RAM/FLASH, I/O and CPU cores etc. Every one highly optimized for it's task. We take this for granted in the embedded world.
So what Apple have demonstrated is you can have the same choice in the "desktop" and likely soon in the "server" world.
Remember, once upon a time, the cache SRAM was a separate part you could replace in a DIMM slot. Today nobody argues that cache SRAM should be user replaceable, because having it integrated in the processor provides so much benefits and resolves so many issues.
Now back to your SAP example. I am sure whoever writes SAP code might one day experiment on an "server" that instead of two-socket 28-core Xeons and 512 GB of RAM uses say a 8-socket 16-core 64GB RAM each (same 512GB RAM) M1-like SoCs, with fast interconnects (we haven't see this yet as there is no use for it in a notebook). With a total power consumption (SoCs with integrated RAM) of say 100W.
Do you think you will prefer such server to your current one?
If you can read from disk, process and put back to disk before the user notices any hesitation do you need more than 16/GB of RAM?
Don’t forget the M1’s are reading ~3GB/s from its long term storage (ssd/nvme)
On all the tests I’ve seen it’s proved the 8GB ram is adequate, I’d still get 16GB though.
One of the reasons Apple silicon is so fast is that that the RAM embedded in the SOC is shared between the CPU and GPU, and this removes the need to move/copy data between the two.
The CPU and GPU have been sharing memory on iOS for years. It wasn't something that was visible in OpenGL because that has its own shit design issues to deal with, but in Metal as long as you obeyed certain alignment requirements memory was memory regardless of who accessed it.
To be fair, `glMapBufferRange` was introduced with OpenGL 3.0 and permits a GPU buffer to be exposed within the CPU's address space for direct access where hardware supports it. Though `GL_MAP_PERSISTENT_BIT` which asks for a persistent mapping — i.e. one that isn't invalidated the next time you issue a draw command — arrived only in OpenGL 4.4 and therefore has never been available on the Mac. The OpenGL 4.4 specification was announced in 2013, so marginally before Metal but after Apple stopped putting any effort into OpenGL.
But, yeah, it's another OpenGL-style workaround for a workaround.
As someone who has recently converted a pile of OpenGL code to Metal, the big wins for me were formalised pipelines, resolving the threading question, and getting to be fully overt about which buffers are ephemeral at the point at which they're either loaded or unloaded from GPU cache.
Apple's tooling for Metal is also leaps and bounds ahead of where its macOS tooling for OpenGL ever was, especially with regard to profiling, debugging, etc, so it's nice to have that supported in a first-party capacity but I think that's probably just a comment on Apple's lackadaisical approach to OpenGL over the years. Other OpenGL-supporting environments do a much better job here — even iOS had pretty cool frame capture/edit/replay facilities for GL ES back when iOS still supported that.
If one is going to compare APIs, better to compare like with like.
That means Vulkan vs Metal.
Metal is different to Vulkan for the sake of it, purely to lock you into Apple. There's no technical reason for Metal to exist whatsoever - Apple stayed in the working group just long enough to steal most of the ideas ATI had put forward, then withdrew and spent the next few years creating Metal.
I was refuting that shared buffers are not something you can do with OpenGL. Though I neglected to include one important caveat: in Metal you can share texture storage and even compel the GPU to use linear ordering to avoid swizzling costs, if you know that that's the correct trade-off for you. I do not believe you can do this in OpenGL.
Your theory about Apple is trivial to discount, however:
The first meeting ever to discuss Vulkan happened in July 2014, and the call to form a working group happened in August. Metal was first released in June 2014.
So it is trivially false that "Apple stayed in the working group just long enough to steal most of the ideas ATI had put forward" — there was no Vulkan working group until after Metal had launched and Apple was never a member. For that reason one can also immediately discount the claim that "Metal is different to Vulkan for the sake of it".
Metal takes AMD's ideas from Mantle and adapts them to something that works across AMD, Intel and Apple's homespun GPUs. Wishing that Apple wouldn't be so quick to go it alone and so reticent to adopt a later standard is valid, alleging a weird conspiracy doesn't really stand up.
Oh, well on iOS you could have used the wilfully obscure CVOpenGLESTextureCache to share CoreVideo buffers (which can just be BGRA) between CPU and GPU without copying. But it doesn't guarantee no conversions or reorderings, it just minimises them.
... and it's not even Mac/iOS cross-platform. It's iOS only.
Iirc we avoided that simply by only having to deal with data coming out of the video decoder. By setting up the config correctly the decoder would output Metal-friendly buffers to which we could map textures directly.
In really high performance computing, which is massively parallel, the speed of moving the data between nodes is a key performance metric, quite apart from the speed of the cores.
You can see the state of the art www.top500.org
There are a number of things involved (latency is a key issue) but in those environments there is a huge amount of data being shovelled around.
It's pretty hilarious that there is literally nothing Apple can do that won't have somebody's panties in a bunch. They could literally create a cure for cancer and somebody would get mad about it. It is obvious to everybody + dog that what they are doing now with these chips is the most exciting thing that's happened in this space for ages and will probably push the entire industry ahead when others have to compete with them. It's a win-win situation for everybody, even if you don't want to use a Mac.
> They could literally create a cure for cancer and somebody would get mad about it.
Yeah because their cure for cancer would involve visiting an overly expensive iDoctor, could only be administered using an iNeedle and require you to take an iTablet for the rest of your life (which would turn out to be basically candy corn & Zima).
You are right; it's something that people have been predicting for many years. At some point, "mobile" technology was bound to be sufficient to run a "desktop" computer. The difference - as is often the case with Apple - is that nobody else has quite managed to line all their ducks up and get it all right at the same time. My own opinion is that the main thing holding this development back is that Windows has proven time and time again to be totally unsuited to running on this kind of system. MacOS or whatever it's called these days has been fine tuned over the last few years - probably using experience from iOS and definitely using Apple's experience of previous architecture switches (68k -> PPC -> x86) - and was anyway a much better base to start with. I suppose you could compare it with the optimised Linux systems that have been crafted to run on low-resource computers such as the Raspberry Pi, though the M1 is hardly "low resource" by that standard!
The only thing that Apple never gets "right" is the price - but that's from my point of view, and there are plenty of people out there willing to pay Apple prices. Sometime in the not-too-distant some other manufacturer will come along with a similar device, cheaper. It won't be running MacOS, so the question is what will it run, and would the great unwashed buy something that doesn't run Windows?
It really doesn't matter if Apple's new computers are "non-upgradable". No, I wouldn't buy one on those grounds alone (let alone the price) but at the moment they give all the desktop computer that most people will need. The really interesting question is whether someone can make an affordable "M1 clone" computer that is upgradable. Maybe have 4GB in-package RAM and an external memory bus for expansion?
Reminds me somewhat of the 1980s again - silly things like the 256 bytes of "page zero" on the 6502 which could be accessed much more quickly than the rest of the memory map, or even the differences between "Chip RAM" and "Fast RAM" on the Amiga.
Apple's notebooks had already their RAM chips soldered on the motherboard for years.
So for their users there is no difference that the RAM chips are now soldered inside the SoC.
When vendors can supply Apple with denser RAM parts, we will sure see 32GB, 64GB RAM SoCs etc.
It is already trivial to run open source OSes on the M1, provided Apple permits baer metal loading.
On the other hand, current MacOS has hypervisor calls built in, so you can create VMs and "boot" pretty much any ARM OS. There are people who already made ARM Windows run on it and that already emulates x86 code by itself. So those who need to run Windows on the new M1 Macs can already (technically) do it... if and when Microsoft decides to sell licenses for it, that is.
Fanboyism is not something new. Youngsters here can't recall, but I can remember the last half of the 1970's - you know, the time of the start of the "Microcomputer Revolution." The 8080 processor from Intel had its fanboys, who laughed at the simplified 6502 from Motorola, and both were looked at with disdain by dudes using the later Z80 from Zilog. Just as today, no opinions, however reasonable or rabid, ever convinced anyone in the other groups.
Enjoy the ruckus from the sidelines while you are using your own favorite gear.
The old RISC vs. CISC wars of the 80's..... and 90's.
But the RISC fanbois actually won that one. As the article alluded to:
no processor has run x86 code natively for decades, there’s always a much more efficient inner core chewing through microinstructions after the x86 code has been decoded and stripped down.That inner core on x86 is a RISC core.
ARM is RISC, RISC-V is RISC (duh), Power is RISC, x86 is RISC (cores). There isn't much outside special-purpose limited run custom logic that isn't RISC.
I recall a bit of common wisdom from circa 1990: The 80486 was the best CISC CPU ever, and the i860 was the worst RISC design ever, but the 860 still outperformed the 486. (I said it was common; I didn't say it was right. But there was a grain of truth in it: despite its design flaws, the 860 managed 5-10 times the MFLOPS of the 486, so if floating-point was what you wanted...)
I believe the IBM z10 was still true CISC, dispatching the actual zArchitecture CISC instructions to the cores (based in part on this IJRD article).
That was 2009, though. The current z CPU is the z15, and this writeup mentions "CISC instruction cracking" in one of the illustrations, which certainly sounds like the pipeline is decoding CISC instructions into simpler ones.
That would also make sense because z10 is superscalar but in-order, while z15 is out-of-order. It's generally easier to reorder RISCy instructions.
z has over a thousand opcodes, between the public instructions and the special ones used in microcode. Going to RISC cores was probably inevitable. z10 cores were big - a thousand opcodes means a lot of gates.
Biting the hand that feeds IT © 1998–2021