Wine won't help. After all the fact it is NOT emulation means it won't do anything to help run x86 instructions on an arm. So unless crysis is recompiled for arm, you won't have any hope of running that.
GE puts new Nvidia tech through its paces, ponders HPC future
A top General Electric techie gave a presentation at the GPU Technology Conference this week in San José, California, and discussed the benefits of Remote Direct Memory Access (RDMA) for InfiniBand and its companion GPUDirect method of linking GPU memories to each other across InfiniBand networks. And just for fun, the GE tech …
-
-
Saturday 23rd March 2013 10:42 GMT John Smith 19
A very clear explanation of a very complex subject
This could have been buried in pat numbers and stats but I found admirably concise. Some writers seem to feel that technical subjects have to be complex to understand, like you have to prove you're smart enough to read their work.
I'm no expert but I felt this gave me a good enough understanding that R-DMA gives you an 8x speed up in data throughput to a GPU, which sound s a pretty worthwhile gain.
Given what we now know about how many FLOPS it takes to animate a face this is not to be sniffed at.
Thumbs up for a nice write up.
-
Monday 25th March 2013 09:35 GMT Michael H.F. Wilkinson
Cool, seriously cool!
We live in such exciting times. I several algorithms which could benefit from much faster global access (and just more processing grunt, but that goes without saying). I do worry how to harness the power embedded in your typical GPU architecture. They do not seem to like data-driven processing order much. Scientifically, that is a challenge of course, not a problem.
-
This post has been deleted by its author
-
Monday 25th March 2013 20:56 GMT DrBandwidth
Arithmetic, anyone?
Interesting piece, but I worry about any table of results that does not appear to be internally consistent....
The piece does not define what exactly is meant by "latency", but it is odd that the results in the second and third columns (both labelled "latency") are precisely 1/2 of the transfer time that one would compute using the complicated mathematical formula "time = quantity / rate". In this case, for example, on would expect that transferring 16 KiB at a rate of 2000 MB/s to take 8.192 microseconds, rather than the 4.09 microseconds stated in the table.
Latency might mean "time for the first bit of the data to arrive" or it might mean "time for the entire block of data to arrive". The latter is not possible given the stated numbers (since all the computed transfer times are precisely twice the stated "latencies"), while the former would imply a mildly perverse buffering scheme that always buffered precisely 1/2 of the data before beginning to deliver it to the accelerator.
Buffering exactly 1/2 of the data is perhaps not as crazy as it sounds -- such schemes are sometimes (often?) used in optimized rate-matching interfaces. If the input is guaranteed to be a contiguous block, then buffering exactly 1/2 the data allows the buffer to transmit the output data at 2x the input rate after pausing for 1/2 of the transfer time. Such a scheme minimizes the latency between the arrival and delivery of the final bit of data in the input block. Unfortunately, it also makes the actual hardware latency invisible (provided that the hardware latency is less than 1/2 of the transfer time of the smallest block with reported results).
Whether this buffering scheme makes sense depends a lot on the data access patterns of the subsequent processing steps. If the subsequent step demands that a full block be in place before starting, then this is the way to go. On the other hand, many signal processing algorithms could pipeline operations with data transfers in smaller blocks, in which case a different buffering scheme might make more sense.