Why don't they just put that 4 Gbytes of memory on the chip, and call it a day. They've gotten the geometries down so far, just add a bunch of memory while you are at it.
You heard it here first.
X86 processor and now coprocessor maker Intel is determined make possible an exaflops of aggregate computing capacity within a 20-megawatt envelope. "We believe that we have the capability and the responsibility to drive innovation faster and faster," says Rajeeb Hazra, general manager of the Technical Computing Group, which …
The speed of light in vacuum is ~30 cm per nanosecond. Roughly one foot. That would be three meters per "ten of nanoseconds". Since Tianhe-2's layout is 27 meters on a side, 38 meters from corner to corner, that would be over 100 nanoseconds from node to a distant node in the best case, which is no longer "tens of nanoseconds". It's probably best if we assume a spherical datacenter or a smaller footprint if we are to believe this figure.
In modern supercomputers not all nodes are connected to - or communicate with - all other nodes. The last big "full crossbar" beast was the Earth Simulator, I think. The typical communication pattern is between closest neighbours and maybe some a bit further away. If you remember your school calculus, that's what you typically need to compute derivatives (to solve partial differential equations, be it for simulating a nuclear explosion or to design an aircraft's jet engine or to price a complex financial instrument or whatever): compute function values at adjacent points in parallel, subtract, etc. Sometimes you need to stitch together the opposite ends for periodic boundary conditions (you can think of a spherical high altitude nuclear blast as an example, but you don't have to), which is where those (multi)toroidal interconnects like Blue Gene (still 4 out of top 8 systems about 10 years after the very first one took the top spot - not bad) come in.
So, your physics is basically right. Well, the signal travels through cables and not through vacuum, so I'd say 20cm/ns, maybe about 150ns over 30m. One should take into account though that nodes 30m away from each other wouldn't need to communicate (much?) when running a typical "scientific" job. Over 3-5m (inside a cabinet? between adjacent cabinets?) you'll fit the latency into "tens" if speed of light is your main concern.
Note that "collective" operations (such gathering results from many/all nodes) are normally done over separate networks and don't have the same latency requirements as parallel computations proper.
the [redacted] days where they could just illegally [redacted] and [redacted] and engage in anti-competitive highly [redacted] and charge whatever they like are over.
the scale and the speed and the scope of their lawbreaking makes [redacted] look like a traffic warden