* Posts by DFellinger

1 publicly visible post • joined 23 May 2013

In-array compute ....

DFellinger

Another POV

One can always count on Chris for exciting headlines and he certainly was quite inspired on this one !

As the leader of the DataDirect Networks' team who who pioneered in-storage processing in early 2010 I would like to contribute to the discussion with these few comments.

First and foremost I agree that embedding processes in the storage is not meant to solve general compute problems. It is, however, meant to reduce latency for iterative processes. Just as GPGPUs are utilized to reduce cycle count, embedding applications that execute continuous data operations increases system efficiency. A file server is essentially a data intensive process that must move data to and from a storage system to satisfy requests from network connected clients. This process of moving data generally entails the use of a serial bus and a cache or socket layer. Regardless of the bus rate or type there must be a protocol which dictates both the bus state, the bidirectional data destination, and the error or retry management. Of course, data is generally copied to the socket before any bus transaction can be called resulting in additional latency. If the file server is in the same memory space with the storage system, a simple page mapping can provide a parallel transfer that is effectively cached without a copy for migration to slower media.

Filter processes are also iterative and good candidates for embedding. Would I rather download raw data and filter it locally or upload the process and filter the data before network transmission? Filters like FFTs with complex time domain functions could benefit from running without iterative network activity. Why not accommodate the entire service structure in page mapped memory? Managing swap space may not be ideal but compare it to a SCSI bus transaction. A great example of that is the work we do with in-Storage Processing for the Square Kilometer Array.

Again, embedding does not solve latency in all compute applications but it does go a long way in reducing data transmission latency for data intensive applications.

Dave Fellinger, Chief Scientist, DataDirect Networks