shift in the dialog
Could we maybe shift it back? It's really hard to filter out any information form all the marketing shit.
Imagination Technologies has revealed a pair of new GPU IP cores that it claims will supply new heights of performance to the likes of Apple, TI, Samsung and others who use the UK company's low-power graphics cores. "You are now talking about, from a gigaflop point of view, supercomputing in hundreds of milliwatts," …
The real problem is that with the very real read-back latencies to the GPU, the bitch of time you have getting all those nice SIMD ALUs to be more or less taking the same branches at the same time you have to expend a horrific amount of development effort to actually get this to scale without hitting one of say 16 bottlenecks, and its not even clear that we are truly doing everything with SSE (or Neon) + some cores that we could.
Some things will go this way... it's take a time and be DSP and niche-like for a long time. Part of the reason why the cache sharing debate is important is that CPUs can sometimes outperform GPUs with high locality operations (e.g. < 64K) that can be stored in-cache. GPU caches also tend to be skimpy on the low end and if you have a shared bus the GPU really can't clock the data out any faster than the CPU. So, distinct bus + distinct memory then improved performance.
Personally I don't think that this will really become mainstream until there is actually a unified set of SIMD/MIMD cores behind the same cache and an ISA that can rapidly switch between MIMD and SIMD types of operations. Oh, and the CPU needs to be able to read swizzled texture memory.
Not just anything, only tasks suited for it, ie independent calculations and parallel logic. Normal user software has a hard time scaling on multiprocessor, for the exact reason that the task at hand is largely sequential. Graphics stuff, specific maths (like photo/video handling) are very parallel in nature (the reason SIMD was invented) and thus can move to the GPU
Long ago I optimised a video codec for a 16x SIMD vector unit. At the end, when I'd run out of things to parallelise, in the worst case only 10% of runtime was spent in the vector code - the bitsream en/decoding is inherently serial and soaked up the rest. No improvement to the vector path could then give more than 10% improvement.
There's little point having a 20x faster vector path without a balanced improvement in the scalar path to keep the vector unit fed. For the tasks a phone is expected to do, this will give better game rendering, *might* shave a tiny bit off video decoding power consumption but otherwise is wasted silicon. There aren't suitable workloads to throw at it.
I'm sure this GPU core is revolutionary and amazing but I cant shake the mental connection that whenever I see the name PowerVR I immediately associate it with the shit intel GMA500 embedded graphics and their shoddy driver support.
So it'll probably be a lovely graphics core for an iphone or windows mobile but you'll be straight out of luck for android/linux drivers
Its about time graphics processors had hardware ray tracing built in. They have the power now and algorithms are a lot more efficient than they used to be. Polygons and textures have only ever been a kludge , its time we waved them bye bye and created consumer systems capable of realtime photorealistic animation.