* Posts by prof_peter

13 publicly visible posts • joined 19 Dec 2014

As liquid cooling takes off in the datacenter, fortune favors the brave


Quoting the folks who designed our 10MW data center, the main reason liquid cooling might be a good idea is that it gives you "high quality heat".

For obvious reasons it's hard to run air cooling much higher than the 100F or so that we run hot aisles in our data center, and in my experience even that sucks quite a bit. If you cool your racks like we do, with chilled water from a central A/C system running to heat exchangers in your racks, the return "hot" water is probably going to be something like room temperature, and depending on your climate, for much of the year you're going to need quite a bit of A/C to dump that heat into the outside air. (some of you might recognize chilled water to distributed heat exchangers as being a common office air conditioning setup. Those of you who don't are the ones who never had one of those exchangers go bad right above your colleague's cubicle)

In other words, what big air-cool data centers are doing is generating heat in a server, blowing it out into a (typically enclosed) air space, and then using fans and something that's basically the opposite of a radiator to suck the heat back into a pipe full of water so they can get it back to the central A/C system and get rid of it.

With liquid cooling you get rid of that horribly clunky intermediate step, and more importantly you can run the water a lot hotter than you can run the air in a room used by humans. If the water coming from your racks is 70C or so, it's really easy to get rid of the heat - all you need to do is pump the water around to an outside radiator or evaporative cooler. (well, with a heat exchanger or two and some other complicated but non-power-sucking stuff, but we don't need to get into that)

Basically you want the heat to run "downhill" - if you can extract heat in a form that's significantly hotter than the great outdoors, then all you need is a bunch of pumps to get rid of it.

And that, in a nutshell, is the argument for datacenter water cooling - it's much more efficient at a whole-facility scale, as it gets rid of almost all your air conditioning and turns it into mere water pumping. It might also let you put more watts into a rack because you get rid of that blowing-air-around step in the cooling chain, but that's probably a secondary benefit. Finally, since your existing data center cooling probably involves pumping chilled water around already, you might be able to convert over incrementally, or mix air and water cooling.

Why Intel killed its Optane memory business


Re: Flawed design

I would argue it's not the lack of RAID. In modern systems it doesn't matter if data is persistent and replicated on a single system - it's not considered durable until it's backed up to another machine. (preferably in a different rack or even data center) If you're going to go over the PCIe bus for the NIC, you don't save a lot by keeping storage on the memory bus.

Why we will not have a unified HPC and AI software environment, ever


Does it matter?

Why would you even care if there's a unified framework? It's not like the two groups talk to each other. And it's not like the folks developing hardware for AI really care about the HPC market - to give an idea of the market size ratio, the entire top500 list would fit into US-East-1 with room to spare.

Traditional HPC - batch systems used by physicists etc. - is becoming a smaller part of overall computing at most research universities, as other fields start using more and more computation, and typically choose post-80s environments for doing so. As the fraction of non-batch computation grows, eventually someone's going to start asking the physicists if they can just emulate their ancient environments on something newer.

There ain't no problem that can't be solved with the help of American horsepower – even yanking on a coax cable


A lifetime ago I had to run cables for DEC VS100s, which were graphical displays that connected to VAX 750s. (these were the ones the X windows system was developed on, but I disclaim all responsibility for that...) The machine room was separated from the terminal location by about 100 feet of fairly full conduit, and I swear the cables were coated in the sort of sticky rubber that they use on rock climbing shoes. I think there was a lot of Kevlar or something on the inside of the cable, because we never broke one.

Skype for Windows 10 and Skype for Desktop duke it out: Only Electron left standing


I suppose using Electron explains the incredibly bad Teams UI? We've gone to Teams for our phone system, and the best part about the Mac client is how incoming calls pop up *under* every other window, so you have to frantically move them all aside to answer the call. Pure genius.

Western Digital shingled out in lawsuit for sneaking RAID-unfriendly tech into drives for RAID arrays


Re: Storm in a teacup?

I'm curious as to why the rebuild actually fails.

Although contrary to popular belief drive-managed SMR[*] isn't great for most sequential write patterns, if you stream enough sequential data (100-300MB minimum, I would guess, depending on the drive generation) the ones I've played with will eventually go into a streaming mode where they're as fast as normal drives.

I should go and look at the original reports - I'm going to guess that the rebuild was interrupting its sequential writes, and then wasn't able to handle the multi-second write latencies that were the eventual result.

[* the other SMRs are host-managed - they report errors if you try to perform non-sequential writes. They don't sell those through normal channels, probably because they figure we'd RMA them for working as advertised]


Re: Storm in a teacup?

In my mind, the issue is that they branded them as NAS drives, so people went and put them into RAID arrays. RAID5 is just about the worst possible thing I can think of that you can do with drive-managed SMR.

For that matter, last week I just gave up on a DM-SMR drive I was using for my time machine backup because I was getting annoyed at how slow and noisy the hourly incremental backups were getting. Not gonna ID the vendor because it was a repurposed engineering sample, and the final shipped version might not be quite as crappy...

LLVM contributor hits breakpoint, quits citing inclusivity intolerance


The code of conduct that Rafael has so much trouble with (https://llvm.org/docs/CodeOfConduct.html) basically requires professional conduct in official LLVM forums, although it's phrased in slightly touchy-feely language.

Basically it equals "don't be an asshole", and requiring that of someone in a professional context is evidently considered "restricting their freedom of speech" nowadays.

Ahem! Uber, Lyft etc: California Supremes just shook your gig economy with contractor ruling


Re: Looked at the other way

Is someone who works part-time at Dunkin Donuts and part-time at Starbucks an employee of both? Of course.

Is someone who works full-time for a company and skips out during the day to work a part-time job somewhere else an employee at both places? Yes, the fact that he's skipping out is an issue between employer 1 and their employee.

Is someone who decides on a ride-by-ride basis whether to take an Uber passenger or a Lift passenger an employee of both? I think the appropriate answer is "yes, and if Uber and Lyft want to avoid both being on the hook for paying minimum wage, they'd better figure out a legal way to talk to each other about it."

El Reg gets schooled on why SSDs will NOT kill off the trusty hard drive


HDD/Flash and technological evolution

"Will NAND flash replace disk?" isn't really a valid question - the question is "(when) will NAND flash replace disk for application X?", and there are hundreds of applications out there with different answers.

If disks weren't increasing their capacity/price ratio as fast as flash, the answer would be simpler - given enough time, flash would be better for everything. But they're increasing at about the same speed, so the most storage-hungry applications are going to use disk far into the future, while others tip over to flash sooner. In particular, for apps with bounded requirements, once flash becomes cost-effective then it's all over for disk, because you only get the cost/GB that disk offers if you need an entire disk.

For iPods the answer was 2005, which curiously happens to be the year that 2GB of flash became as cheap as a micro disk drive. Every year after, instead of increasing the capacity of their base model as disks grew, they could keep it the same and pay less for flash.

For a wider market the transition is less abrupt - e.g. what's been happening with laptops in the last 9 years since Apple introduced their 2nd-gen Macbook Air. (about 45% of laptops are expected to ship with SSDs this year; the rest are evidently sold to 4chan trolls who need the extra storage for all the porn they download...)

For just storing lots and lots of data (like that Microsoft OneDrive account you haven't touched in a couple of years, or the Dropbox folder from a project you finished last year) hard drives are going to remain king for years and years, possibly becoming weird and specialized in the process because they don't have to remain plug-compatible with your old machine.

Finally, flash has had the speed advantage since it was introduced; if you have a small amount of data and you need it to be fast, you're stupid to put it on a disk. However as the size of that data grows it starts becoming cost-effective to play all sorts of caching and tiering games, so that you can use disk for capacity and flash for performance.

Hold on a sec. When did HDDs get SSD-style workload rate limits?


RTFWP (Read The Fine White Paper)

George Tyndall, “Why Specify Workload?,” WD Technical Report 2579-772003-A00


Basically the heads are so close to the platter that they touch occasionally. To keep the drives from dying too quickly, the heads are retracted a tiny bit when not reading or writing. From the article:

"As mentioned above, much of the aerial density gain in HDDs has been achieved by reducing the mean head-disk clearance. To maintain a comparable HDI [head-disk interface] failure rate, this has required that the standard deviation in clearance drop proportionately. Given that today’s mean clearances are on the order of 1 – 2nm, it follows that the standard deviation in clearance must be on the order of 0.3nm – 0.6nm. To put this into perspective, the Van der Waals diameter of a carbon atom is 0.34nm. Controlling the clearance to atomic dimensions is a major technological challenge that will require continued improvements in the design of the head, media and drive features to achieve the necessary reliability.

In order to improve the robustness of the HDI, all HDD manufactures have in the recent past implemented a technology that lessens the stress at this interface. Without delving into the details of this technology, the basic concept is to limit the time in which the magnetic elements of the head are in close proximity to the disk. In previous products, the head-disk clearance was held constant during the entire HDD power-on time. With this new technology, however, the head operates at a clearance of >10nm during seek and idle, and is only reduced to the requisite clearance of 1 – 2nm during the reading or writing of data. Since the number of head- disk interactions becomes vanishingly small at a spacing of 10nm, the probability of experiencing an HDI-related failure will be proportional to the time spent reading or writing at the lower clearance level. The fact that all power-on-time should not be treated equivalently has not been previously discussed in the context of HDD reliability modeling."

Why are enterprises being irresistibly drawn towards SSDs?


Re: Change in Flash technology to eliminate finite write lifetime?

Until recently the *only* goal of flash designers was to deliver more bits for less money - until maybe 3 years ago maybe 95% of the flash that was produced went into iPods, cell phones, thumb drives, and SD cards, with SSDs accounting for a total of 3% of flash production. In the 10 years before that point, flash performance went down by nearly a factor of 10, and lifespan of low-end flash went down by far more.

A lot can change in 3 years, though, and there is now enough demand for flash in performance-critical roles to at least halt this decline. (whether enough customers will actually pony up the money for higher-performance chips to make them profitable is debatable, though.)

The reason we don't see flash devices failing left and right due to wear-out is because they've been getting bigger more rapidly than they've been getting faster. (or more accurately, than computer I/O workloads have been getting faster) The internal wear-leveling algorithms distribute writes evenly over the chips, so if you build an SSD out of consumer-grade chips with a 3000-write lifetime, you have to over-write the *entire* device 3000 times before the first chips start failing. (Sort of. A small number will die earlier, but there's spare capacity that can deal with that.) For a 500GB laptop drive, that's 1.5PB of writes, or 35GB/hr, 24/7, for 5 years. For a 1TB enterprise drive full of 30,000-cycle eMLC chips, you'd have to run it at 700GB/hr (200MB/s) for those 5 years to wear it out.

And wear-out isn't just a problem for flash anymore - the newest high-capacity drives are only rated for a certain total volume of reads and writes, due to head wear effects, which works out to a shorter lifespan than flash. (and reads count, too, while they're free on flash) See http://www.wdc.com/wdproducts/library/other/2579-772003.pdf for the gory details...

Disk areal density: Not a constant, consistent platter


Of course they're complicated

Unlike ancient drives with fixed numbers of sectors per track, and thus varying areal density, current drives with ZCAV (google it) try to achieve constant areal density by varying the number of bits on each track. Since bits come in 4KB sectors you're never going to achieve true constant areal density, although at 1-2MB per track, +/- 4KB isn't a big difference.

However it's quite possible that what was being referred to is the difference in areal density between *heads* (or equivalently, surfaces) in a modern drive. Different heads in the same device can have wildly different (e.g. 25%) areal densities - google "disks are like snowflakes" for a very interesting article from Greg Ganger and the CMU PDL lab on why this is the case. A simple test in fio can show this behavior - e.g. latency of 64KB sequential reads across the raw disk LBA space. You'll see the disk serpentine pattern, with different speeds for each surface.