From the article: "it is impossible to deny the attraction cool technology has"
Surely that's a tautology - if it had no attraction it wouldn't be very cool, now would it?
Playing with cool technology is not why either of us chose systems administration as a career. Nor was it something we expected as part of our employment. That said, it is impossible to deny the attraction cool technology has. It is fitting, then, that on systems administrator day we get the chance to publish a review on the …
2 KW in 2U is just nuts. If you aren't going to have direct-to-water cooling, then just resign yourself to going elsewhere. Why move all of that air when a few gpm (or lpm) of water can do the job?
We had water-cooled IBM mainframes here for many years and I can only recall one very minor leak.
Of course, you could always immerse the whole kit in some nice cold Fluorinert... :-)
This is not the densest system ever. Take a look at the Dell C4130, a dual socket 1U server with four GPUs, or the FX2 chassis with the FC430 blades. Eight dual socket machines in 2U. Both systems can draw over 60 KW per rack. And extensive testing shows the best cooling method is: air! We get best results with active rear door cooling, or in-row cooling systems.
...because this one's ruined by drool now. I salute the awesomeness of my fellow Canadians.
A repy to Me19713: not every organisation can run to water-cooled stuff. As long as I could site this down the hall, I would take fans over water, beause simpler, as long as they do the job.
Unfortunately, I don't think it can. Tesla cards are not made for DirectX or OpenGL, they're more suited towards CUDA and other mathematical operations. My first thought was "oh yiss I'm going to spin up 20 VMs and make them all run GTA at the same time muahahahaha!"
Tesla K40 cards definitely have the GPU power to do that, but that's just not the way they're built. Their graphics acceleration capability is actually pretty limited in my experience - which is exactly why we're calling in other administrators to test them. These things are quite literally built to go into supercomputer nodes (For example, the Cray CS300 Supercomputer uses Telsa K10's) to crunch numbers on massive data sets quicker and more efficiently than you could do with even the fastest Intel CPU... Which means using software and syntax I for one have never played with before.
This is hardware for scientists and academics, not so much gamer nerds like myself. I learned a lot about supercomputing, though!
The Tesla K20 series and later do support OpenGL 4.4. For the K20 you have to turn on OpenGL support via the nvidia-smi utility; for the K40 and later support is automatically on.
What the Teslas don't have is a video-out connector of any sort. However, we use the Teslas in a medical imaging visualization cluster, where computation and OpenGL rendering is done on the GPUs and displayed remotely using VirtualGL and the TurboVNC or Cendio ThinLinc client over 10 and 1 GbE links. I don't know about DirectX support but if it is supported, RDP server side rendering could do the same under Windows.
This post has been deleted by its author
If a small company had a numerical computation that required computing power in the range supplied by this device, how much would they have to pay per hour/day whatever to use such a system?
In other words, how does the economics of one of these stack up against (say) an Amazon style computation service application spun up when needed?
If the characteristics of the computations needed do match your server, would half a rack of these things pay your pension?
"Nah. Not that expensive mate. Aiming high, we're probably at $1500 for the chassis, $1500 each for the chips, $4000 for the RAM and $3500 per card? This is a $15K server. Tops."
Not even close.
The chips (e5-2697v3) are something like $2700 apiece = $5400
4 x Tesla K40 = $12K
256GB DDR4 2133 ECC = $2500
Pair of undefined 15k drives and 6x480GB SSDs equal about $2000.
So, along with the $1500 chassis the server equals $28800
Prices are from Amazon and are in USD.
"Supermicro!"
If you'd really do your journalist part you should get your figures right but also tell what makes this server board superior to all the rest. Is there any other reason to choose this server unless you need to drive 4 Tesla cards in 2U? And, as you propose scenario of a rack full of such servers why not go for 4U units such as the 4028GR-TR that can take 8 such cards? What would be the benefit for either approach?
You mention that "the maximum amount of compute capacity you can cram in to 2U of rack space" but that's only the amount of GPU compute power you can cram into this - there are 2U servers with 4 Xeons and which can take more RAM.
The replacement of internal USB/SD with the DOM SATA connectors bugs me a bit - how do you mount the drives in another system or USB cradle since the power connector is also nonstandard? Certainly the negligent power savings angle doesn't quite fit in an article about a server with 2kW PSUs. Why didn't SM settle for PCIe M.2 if they oppose industry standard USB and SD? Those (M.2) are not capacity limited, they're cheaper, abundant and several times faster. I'd like to hear your opinions on these things.
"We inadvertently got the chance to test this as the ambient temperatures soared to 35 degrees Celsius outside and the server was left on during testing to heat the room it occupied. The system handled elevated ambient temperatures quite well."
So it worked according to the specs. How about that!
Ah, you're right, I goofed on the calculations. Still, $30k for such a system isn't bad at all, mate. That's a hell of a lot of compute for that price.
"Is there any other reason to choose this server unless you need to drive 4 Tesla cards in 2U? And, as you propose scenario of a rack full of such servers why not go for 4U units such as the 4028GR-TR that can take 8 such cards? What would be the benefit for either approach?"
There isn't, really, a good reason to pick one over the other that is universal. The world isn't quite so black and white and each of these systems have a reason to be. It really boils down to "can your workloads make use of X amount of oomph per node". To be perfectly honest, we're running in to issues driving 4 GPUs in a single node, and these aren't the top end GPUs. 8 per node starts getting into the "having specialist HPC code" territory.
I suspect 2 GPUs is the sweet spot for many workload, with 4 GPUs being the border of what's workable with selfware today.
"You mention that "the maximum amount of compute capacity you can cram in to 2U of rack space" but that's only the amount of GPU compute power you can cram into this - there are 2U servers with 4 Xeons and which can take more RAM."
GPUs > CPUs in terms of compute. Though I do understand you on the RAM thing. Believe it or not, I have a pretty in depth article on the discussion between both and why sometimes RAM matters (holding billions of variables, etc.)
"Why didn't SM settle for PCIe M.2 if they oppose industry standard USB and SD?"
Actually, this is an emerging standard where power is supplied as part of the SATA port. It's quite common on a number of systems. NVMe setups use it, for example. (Though in the case of NVMe there is also a connector for the PCI-E lane.)
Those (M.2) are not capacity limited, they're cheaper, abundant and several times faster
M.2 connectors are, for the most part, better. There are issues about "where do you put the smeggling thing" on a system crammed that densely, which is why I suspect that they've stuck with the SuperDOM. I should point out here, however, that SuperDOM is not entirely proprietary. Plenty of folks make SATA flash drives that are powered by the SATA port in much the same way. For reasons I don't entirely understand the industry seems to be moving to M.2 for consumer/SMB level hardware and these powered SATA drives for server stuff. I actually have been gathering info to deep dive into the whys of it.
So it worked according to the specs. How about that!
Which is actually quite amazing. Plenty of systems I've tested don't work within the claimed operating temperatures. Supermicro consistently does. Given the power density of this unit, I'm quite surprised that they managed it.
Audi Formula 1 engine designers point out that noise is wasted energy. https://youtu.be/FFwoxM1MiBw?t=5m46s
Density can be done with water virtually silently and more efficiently. The first time you hear it the adrenaline flows, but it will wear on you over time.
funny article, as Dell has done the same in 1U with only 1.6KW, just a few less disks, but lets face it, these machines should go into a setup that has a fast central storage. with these ultra dense ones, you can fit 40 of these babies in one rack, taking up a whopping 65KW. and for the cold plate cooling guys, its possible, but we see better efficiency with rear door cooling.
and they are really good at heating up your home, although the sound pressure may be a bit hard on the people living there.