* Posts by nvo

6 publicly visible posts • joined 31 Oct 2011

Google downshifts App Engine to infrastructure cloud


per month

it's$0.10 per GB per month, not hour... that'd be worse pricing than VM memory.

CSIRO orders 2,144 core, 174,720 GPU upgrade


174,720 GPU cores? Try again.

The number of "cores" in a GPU is highly misleading, and it's a shame to see El Reg perpetuating this confusion. For @#$'s sake, those aren't cores, or anything close, they're 32-bit wide ALUs. Is each lane of a vector processor called a core? No. Is each Xeon E5-2xxx a 64 core processor because it has 8 processors each with 8-way SIMD execution (AVX)? No.

A Fermi SM (Streaming Multiprocessor) is more analogous to a core. Instruction decode and issue happens at the SM level, as does any branching or control flow (beyond basic predication); each SM has 32 ALUs, which are what this article calls cores. Each Tesla GPU has 16 SMs, (14 of which are active in the x2050 products).

Lets get this shit together, because SIMD issue width != core count, and its meaningless marketing drivel to say otherwise.

UK's supercomputers rev up to hit 800 teraflops



umm... I think the words you were looking for are "thousand thousand thousand thousand" or "million million". Unless the UK is hiding 800 yottaflops of computing power somewhere...

Applied Micro leaps ahead in ARM server race


I watched the webcast, and I didn't see any indication of the number of how many cores were running using the FPGA simulation (though the article indicates that it is 128 cores). Given that only a pair of the FPGAs appear dedicated to the CPU complex, I suspect that much more than a single board is necessary to simulate the full 128 cores; this board seems that it is intended to mirror the architecture of an SOC. Honestly, I'm not terribly surprised - four-issue OoO cores are not traditionally small devices.

Because the cores share an L2 cache (the presentation refers to them as a module, although that brings back unpleasant memories of certain AMD slide decks that haven't delivered), we will probably see the cores coming in pairs. At least for validation purposes, they probably want two modules, so that they can test the on-chip coherency fabric.


I guess... At the same time, we know there is already a solid high-speed SERDES bank on the first generation for the pair of 10Gb Ethernet ports (either that or it's hard block, which I kind of doubt). That seems like a lot of bandwidth for just 2, or even 4 cores. If memory serves, managing even a single 10GbE port takes a decent amount of CPU time, or at least it did when AnandTech ran tests on dual Shanghai Opterons a couple years ago.

Moreover, it looks like, regardless of the initial core count, the SMP interconnect is a separate interface from the SERDES block, as are the SATA/SAS PHYs. (Sidenote: those configurable blocks for handling network protocols, RAID, etc. are *%^#ing sexy)

That is a lot of overhead; it seems like they'd want to cram as many cores as possible onto the die, simply to distribute that overhead over more CPUs.

If that shot of the simulation board is correct, then we know it only takes seven 40nm FPGAs to simulate 128 cores; this would suggest that more than 16 cores can be simulated on a given FPGA, and while I'm just a layman, its kind of obvious that, in terms of logic density, FPGA<<ASIC.

This is just a little devil's advocacy. Whatever the core count turns out to be, if Applied Micro can deliver, we may be witness round two of the ISA wars sooner than we thought.


More than 2 cores.

Umm... Hey Tim? You might want to take a closer look at that nifty little block diagram.

If you look a little closer, you can see under the slightly transparent CPU complex block. It's worth noting that what's on that block contradicts your supposition of 2-4 cores.

If the diagram is accurate, we are looking at the following:

8 ARMv8 cores at 2.5GHz, arranged in 4 pairs, each core having its own independent core and L1I$, L1D$, but with the L2 (and presumably the system interface) shared between pairs. The L3$ is stated to be 8MB, and connects to the cores through the coherent network. Moreover, there are two "memory bridges" connected to the coherent network, each with a pair of DDR3 controllers (this makes 4 channels total, if the controllers are 64b wide).

This is very different than the supposed 2 cores, and much more exciting as well. If applied micro can get this in at 2W per core (perhaps 30W or 40W per node overall, memory and storage included), the datacenter is going to be an interesting place in late 2012.