Good article, but how many frames per second does it get playing Crysis?
Nvidia pitches its Tesla hardware as a magical solution for the world’s toughest computing problems. Just move your code that runs well across many processors over to the Tesla boards, and Shazam!. You enjoy sometimes 400 per cent improvements in overall performance. Despite such mind-blowing increases in horsepower, Tesla …
...not only when you flunk it.
There used to be a company called Floating Point Systems that did this kind of things for HPC applications way back when. They existed in a niche market for a while until the main stream HPC manufacturers produced machines that did not require knowledge of how to program a VLIW box and delivered better performance. In the mean time, the guy who started FPS drove around in a high end sports car watching his radar detector.
As I recall, similar efforts also had a brief flash of life as add in cards for the PC when it first came out. My thought is that this will be another short lived success.
What if we start pushing more operations to specific chips like GPUs? I mean something meaningful like video encoding, cryptography, compression not Aero ;)
It is history repeating itself - anybody remembers coprocessors in 386 era? We got to the point when x86 architecture is slowing us down - if nVidia and ATI go their way there's a big chance we will find PowerPC, ARM or RISC in our PC at some point of time, nicely coupled with GPU and some other specific processing units, sound cards maybe? That would be good, albeit harder to program. But I believe if it becomes standard compilers will take some load of optimizations thus making whole experience nicer.
@AC, the original Amigas didn't have a FPU built in to their 68000 CPU so FPU add-on boards were developed by 3rd parties to help with 3D rendering which requires a lot of floating point precision. Later CPUs in the 680x0 family had FPUs built-in, except for the economy (EC) versions.
Perhaps you were thinking of the transputer expansion system for the Amiga? One board contained a transputer which would work in parallel with other transputer boards adding up to a seriously powerful rendering machine (for the time).
"As I recall, similar efforts also had a brief flash of life as add in cards for the PC when it first came out. My thought is that this will be another short lived success.' = Iain B. Findleton
Excuse me.. This is a reworked video card and these features will for the most part are available to anyone who has recent vintate Nvidie or ATI video card.
Up until now, the computing power in these video chipsets has been sitting idle on the PC if it's not being used for rendering. Now the video card manufacturers are exposing the computational power of these cards to the open world.
NVidia's solution appears to be quite stark compared to the drie by AMD with their integrated video processor set from ATI.
Intel is in the best position in terms of design. But AMD is far ahead in terms of implementation
The only question that remains is how the Apes are going to use 1 terraflop or computing horsepower...
Generating Sudoku Puzzles? Playing Solitare? Or selling those CPU cycles to large scale computational projects?
The answer is obvious.
@Gary F & AC
I believe AC is referring to the Agnus, Denise, and Paula chips that unloaded such tasks as RAM management, bit-blitting, audio, and video data transfer. It was possible to do a lot of amazing tricks (for the time) with creative programming of the "Copper" in the Agnus chip.
Later Amigas got upgraded custom chips as well as CPUs with an FPU.
i think AC's Point was that finally after all this time the x86 have finally found out that Co-Processors are the future to get more speed etc.
sure the X86 has have forms of bolt-On Co-Processing, just as the Amiga Had those cool Tranputer boards and FPGA Zorro Plug-in boards that Autoconfig just worked with all those years before ;)
full circle as it were, and full credit were its due, in the Tranputer chip design, and the Amiga's all inclusive Onboard Co-Processing for the masses.
Add-On boards are fine, but you cant beat a fully included set of Co-Processors Onboard and standard for use by anyone from day one to set the standard and push the innovation.
the marketeers seem to have forgotten that all inclusive fact to grow your real long term market share , and prefer to use the short term everythings an add-on so noone takes the standard and pushes it forward as a whole system...
Revolving door, a cycle...El Nino-vidia, if you will. Eventually the current trend towards specialised chips will revert to massive single core CPUs in the guise of less power, cheaper, easier to maintain/program for. And then 15 years later, there will be yet another mass schism just like we're looking at right now.
Howzat phrase go...everything old is new again?
"The consensus seemed to be that it takes about a month to learn the CUDA nuances and tweak code for the GPGPUs."
"About a month", eh? Are you serious? The amount of time to learn CUDA may be constant, but the amount of time taken to port (not "tweak") code to CUDA depends on how much code there is and how hard the "tweaks" are going to be. Often a complete refactoring is required which can take years for a complex engine. The phrase "about a month" makes CUDA look good but it's in the top percentile of meaningless statistics.
As for Moore's Law, it has not been "blown away" or any other such nonsense. Moore's Law refers to the density of transistors on chips, and with this nVidia are significantly BEHIND the curve.
I liked the breathlessness of the article though. nVidia fanboyism is clearly alive and well at El Reg.
I did my masters thesis in High Performance Computing comparing NVIDIAs CUDA with an FPGA based solution for several different applications including Monte Carlo Simulation, FFT, seismic analysis and 3-D Imaging.
All these are ideal applications for parallelisim. The FPGA solution (a big Xilinx V5) outperformed a traditional CPU implementation by between 2x and 300x depending on application (monte carlo 300x, very small FFT 2x). This was quite impressive. I then tried the CUDA implementation (GeForce 8800 Ultra). It wiped the floor with the FPGA on all four applications.
This was a bit of a shock for the FPGA company I was working for as the GPU was about a tenth of the price of their implementation. Their argument was that the FPGA could do double precision arithmetic (very badly). That argument is now blown out of the water if CUDA now supports double precision.
This product will revolutionise the HPC industry. It is ridiculously easy to code in C, costs buttons and outperforms anything else in the industry. If you work in HPC consider this product.
*Note: I am in no way affiliated to Nvidia, I currently work on FPGAs (very small V4's) for consumer electronics so I am nowhere near the HPC industry any more and have never been involved in the graphics industry. This product just amazed me.
It wouldn't have been a big deal to actually go and interview some heavy duty research freaks from a UNI. Ask them what they thought of the performance and bang for buck when it comes to calculating how many spheres fit into a cube or whatever.
Surely the opinions of people who have to do hard sums every day is better than the delusional ramblings of a bunch of old men fondly remembering a child's toy.
I have been fooling around with CUDA SDK for a few months, and reviewing the on-line lessons. Having been involved in parallelism since the Thinking Machine, nCube and MassPar days, I can see there are both similarities and differences between what we have with the CUDA offererings and prior attempts.
The first difference is economic - CUDA is built entirely from consumer-grade offererings and parts, and therefore shares the R&D and production efficiencies that Nvidia gets from their graphics chip business for free. Unlike Thinking Machines and the rest, there isn't even any proprietary switching technology to be built around it, or expensive packaging. The only real cost has been the development of the CUDA SDK, and that is minor in comparison. IMHO, this is why CUDA is set for success - because it IS a low-cost offering that has a high degree of scalability on the rigth problems and great performance on the right problems.
The second difference however, is a negative. The Single Program Multiple Data architecture of the CUDA is a bit more flexible than SIMD, but it is still more limiting for general purpose problems than a full MIMD MPP machine. In addition, there are practical problems wrt to the maximum size and execution windows of CUDA kernels that also limit it's applicability to general purpose problems. So while many of the older massively parallel machines were adapted to handle some commercial processing (database programs for the CM-5, Oracle running on nCube, etc.), it is clear that CUDA will only be a very fast number cruncher. Not that that is a bad thing, but it should at least be noted.
CUDA is both harder and easier than it's competition to learn. The libraries are well organized and very accessible. Nvidia has partnered with universities to ensure that courseware is available on the web. It's just C calls. BUT - the memory management considerations are real, and optimizing memory management banks, memory layers and latency, etc. all require some thinking. As one of the U of I instructors said in the courseware - "CUDA is easy to program for if you don't care about performance"... But getting good performance takes work and a very detailed understanding of the underlying architecture and it's limitations, especially for non-trivial applications spanning several GPUs and kernels. It's hardly a walk in the park, and I suspect that it will always be better suited for certain types of problems than others.
Still, Nvidia have produced something of great value from what is essentially a consumer-grade graphics chip. As long as they can continue to improve the libraries and leverage the economics of the consumer buisiness, CUDA should have a long lifespan in certain sectors...
Was looking into CUDA a few weeks back, and was very impressed at some of the already available applications for the smaller time user also, than just the HTC crowd. The dvd encoding applications etc have an immense speed improvement than anything running on the normal desktop CPU. Ok, this is all still new with little support, but as well as with the HTC crowd, big number crunching such and encoding could become quiet useful.
Thumbs up to nVidia, will be interesting to watch how this all develops in the coming years.
Far from being negative about the NVidia product, I had occasion to touch on just last week. A truly fine looking box, although the PCIe cable was a bit intimidating. I would personally love to have one connected up to one of my personal machines, and I could probably even find a good use for it. Even the in-box card model is pretty nice sitting in a tower, and would certainly speed up some of the image processing stuff I do in my spare time.
No doubt about it, NVidia has done a good job of consumer appeal here, and I suspect a lot of single application small labs could make good use of them. Really massive deployments, however, would be pretty ugly looking and a cabling nightmare in the current product context. I would be surprised to see the arrival of a large super-computer configuration based on this thing in its current format.
Ah Paula, Agnus and Denise - a tear of nostalga rolls down my cheek...
Indeed, they did get upgraded in the latter models, such as the Amiga 1200 and 4000, which had the AGA graphics architecture. This allowed the new 8bit 256 colour modes, and the incredible hack called Hold And Modify, which calculated the colour of each pixel as a mathematical difference from the preceding pixel, allowing the entire 16bit pallette to be used on-screen.
If memory serves me correctly, the motherboards were all named after B52 hits (the 1000 being Love Shack, and the 500 being Rock Lobster), while the custom chips were all named after the founders girlfriends (Amiga itself being spanish for 'girlfriend').
A little disturbing then, that the latter chips became Fat Agnus, and Gary. But I digress.
There will always be a swing between seperats and onboard, just as we repeatedly switch between parralell and serial buses in our constant strive for speed improvements. I believe it was when AMD intergrated the memory controller into the first Athlon, that they finally eeked out a performance advantage over Intels rival silicon.
It goes to show that the underlying design principle doesn't really matter, it's down to the practical implementation at the end of the day. I strongly suspect that after the industry has exhausted the capabilities of the SATA standard, it's replacement could very possibly be a parralell solution again.
When the next big IC revolution hits (which is starting to look like nano component design) there will be a huge re-integration of all these disparate components.
back to my digression...
When the Amiga crashed, it used to reboot to a black screen, with a red flashing box at the top, exclaiming "Guru Meditation" followed by a reference number.
The Amiga founders used to make controls for huge arcade games, one of which being a full-sized surfboard. When they tried to get it working with thier precious new Amiga, it crashed, flickering colours across the screen when the board moved.
Being new-age kinda guys, they then preceeded to meditate on said board, the less flicker, the more centrered thier kalma. Shortly after, the term made it's way into the firmware.
"But, er, Larrabee is just slideware for the moment, and it’s hard to win developers’ hearts and minds even if you give the best slide."
Bollocks, good sir. If you are developing for any of these GPU platforms you are either doing essentially-branch-free linear algebra or you are an idiot. For any *reasonable* problem, the fastest way to bring a solution to market starting today is to sit on your arse surfing El Reg until Intel deliver Larrabee.
Unlike the IA64 project (which required breakthroughs in compiler technology that had defeated researchers for forty years) Larrabee requires no great steps forward. It is "just a matter of fabrication" and whatever else you might say about them, Intel are *good* at fabrication. Larrabee *will* happen and roughly on Intel's stated timescale.
I believe we all don't need such computer power other than research institutes use for scientific calculations. another thing is we all couldn't afford it, nor the millionaires will waste their money for that thing wich wll be obsolete in a yr or so.
They should focus on making low cost high efficient with low heat low energy GPU that meet end users to keep the company moving forward and match the big boy competition edge.
"I believe we all don't need such computer power other than research institutes use for scientific calculations. another thing is we all couldn't afford it, nor the millionaires will waste their money for that thing wich wll be obsolete in a yr or so.
They should focus on making low cost high efficient with low heat low energy GPU that meet end users to keep the company moving forward and match the big boy competition edge."
HUH? Such computer power is simply a byproduct of the most desired computer add-on on the market today: a 3D graphics card that can be used to play the most powerful games at fast framerates (OK, except Crysis..lolz). The ONLY reason that this is so heat and power intensive is THAT is exactly what you need to play games well - their suitability for scientific computing is merely a byproduct of having so many good math processors on a single chip. But as 3D graphics IS all math calculations anyway, that just makes sense.
And your line about "millionaires" is just plain insane - these cards cost hundreds or thousands of dollars, not millions. Buy a mobo with three PCI-E 16x slots, and put three 9800GTX cards in (with a big power supply) for under three thousand dollars and you have a supercomputer...