Interesting developments, in particular the having multiple MPI tasks able to run simultaneously on a single Kepler chip. Great added flexibility.
Nvidia's Kepler pushes parallelism up to eleven
When Nvidia did a preview of its next-generation "Kepler" GPU chips back in March, the company's top brass said that they were saving some of the goodies in the Kepler design for the big event at Nvidia's GPU Technical Conference in San Jose, which runs this week. And true to its word, the Kepler GPUs do have some goodies that …
-
Tuesday 15th May 2012 22:16 GMT Richard Boyce
Power vs clock speed
Interesting. However, one thing early on bothered me. If power scales as only the log of the clock speed, that would reduce the sensitivity of the power consumption to clock speed, not increase it. Wouldn't it? So I suspect it may be the other way around, that the log of the power consumption scales with the clock speed.
-
-
Thursday 17th May 2012 06:13 GMT Anonymous Coward
Re: DP bad
Not just DP bad on Kepler K10, you have 50% less memory per GPU. While there maybe more memory per board, I don't believe the GPU's share memory (i.e. all connectivity between the GPU's is via a PCIe bridge) so you get a choice of hitting main memory via PCIe or the second GPU via PCIe if you need to run larger data sets.
-
-
Wednesday 16th May 2012 12:19 GMT Tom Womack
You've got cores and core-groups confused at the start; you write
The Fermi GPU had 512 cores, with 64KB of L1 cache per core and a 768KB L2 cache shared across a group of 32 cores known as a streaming multiprocessor, or SM
where in fact there is a single 768KB L2 cache shared between all 512 cores, and 64KB L1-like memory shared across each SM.
'The Fermi GPU has sixteen streaming multiprocessors, each comprising 32 cores and 64KB of fast memory, and a 768KB L2 cache shared by the sixteen SMs' would be a more correct way to put it.