back to article Nvidia's Kepler pushes parallelism up to eleven

When Nvidia did a preview of its next-generation "Kepler" GPU chips back in March, the company's top brass said that they were saving some of the goodies in the Kepler design for the big event at Nvidia's GPU Technical Conference in San Jose, which runs this week. And true to its word, the Kepler GPUs do have some goodies that …

COMMENTS

This topic is closed for new posts.
  1. Michael H.F. Wilkinson
    Thumb Up

    Interesting developments, in particular the having multiple MPI tasks able to run simultaneously on a single Kepler chip. Great added flexibility.

  2. Richard Boyce

    Power vs clock speed

    Interesting. However, one thing early on bothered me. If power scales as only the log of the clock speed, that would reduce the sensitivity of the power consumption to clock speed, not increase it. Wouldn't it? So I suspect it may be the other way around, that the log of the power consumption scales with the clock speed.

    1. diodesign (Written by Reg staff) Silver badge

      Re: Power vs clock speed

      Yes, you're not the only one to spot that. The article has been amended.

      C.

  3. druck Silver badge
    Unhappy

    DP bad

    Why the massive hit on double precision in the K10 compared to the Femi? Is there a technical reason, or is it just to force DP users on to the more expensive K20?

    1. Anonymous Coward
      Anonymous Coward

      Re: DP bad

      Not just DP bad on Kepler K10, you have 50% less memory per GPU. While there maybe more memory per board, I don't believe the GPU's share memory (i.e. all connectivity between the GPU's is via a PCIe bridge) so you get a choice of hitting main memory via PCIe or the second GPU via PCIe if you need to run larger data sets.

  4. Tom Womack
    Boffin

    You've got cores and core-groups confused at the start; you write

    The Fermi GPU had 512 cores, with 64KB of L1 cache per core and a 768KB L2 cache shared across a group of 32 cores known as a streaming multiprocessor, or SM

    where in fact there is a single 768KB L2 cache shared between all 512 cores, and 64KB L1-like memory shared across each SM.

    'The Fermi GPU has sixteen streaming multiprocessors, each comprising 32 cores and 64KB of fast memory, and a 768KB L2 cache shared by the sixteen SMs' would be a more correct way to put it.

This topic is closed for new posts.

Other stories you might like