What I would like to see...
....is plug-in replacement for the archretarded 220V per-server downtransformers, with the replacement feeding off an externally-provided 24V DC feed. That would be win.
Probably someone who has a patent on that though.
Intel wants you to know that data centers are wasting energy - and money - by over-cooling their servers, burdened by warranties that may prevent them from aggressively raising their temperature. During a wide-ranging discussion on enterprise-level cloud computing last week, Intel's director of platform technology initiatives …
"..But we do know that we can meet the requirements of ASHRAE in this context..."
"..he does suggest that ASHRAE's guidelines are sound..."
I'm sure that ASHRAE are a well run and highly professional organisation and if I wanted any advice on the design, operation and safe running of any kind of cooling system then they would be the first people I'd call on.
However, I fail to see how they can be the 'authority' for the maximum recommended temperature of air inflow into a datacentre as well as other aspects of recommended temperatures for cooled equipment. Surely these figures are the responsibility of the designers and manufacturers of the equipment?
(Note, by this I do not include any legal requirements associated with human safety and comfort, these are covered by legislation that ASHRAE would 'pass on' in other operating recommendations).
...most of the recent Sun and Dell x64 hardware that I've come across doesn't allow you to regulate the onboard [very high] rpm fans, so I suspect if the average server room started using this policy they'd see server fan speeds increasing as they falsely try to compensate due to the higher ambient.
Haven't tested this in a a year or so, but I remember seeing an increase of 0.5A on some x64 iron with the fans running at full 15,000rpm.
Obviously the likes of Google can get away with this, since the hardware is sufficiently customised. The article does hint at the fact that this requires integration of server cooling and 'CRAC' systems.
As an aside, some of the smaller equipment / servers rooms that we've seen (around 4 racks) are deliberately run cooler than is really necessary simply because in increases the response time in the event of HVAC failure (takes longer for the temperature to start triggering auto-shutdowns etc). This is important for organisations that are running their own small server rooms on the cheap, and don't necessarily have N+1 on those units (no matter how much they need it!).
One thing no one talks about when running "hot" is the total lack of safety margin for cooling failures. We run too cold for one reason: It buys us time in the event of a cooling failure to either fix the issue or get shut down gracefully. Not the best tradeoff, but our facility doesn't have the capability for redundant systems so we need whatever time we can get in case of a failure. Thermal inertia is a wonderful thing in those circumstances, but if you're already near the limits you have little to no time to react.
"Perhaps CRAC dealers with enlightened self-interest will find increased profits in retrofitting their installed bases with power-management intelligence."
It seems, though that most CRAC(K) dealers end up using their own product, then they lose all sense of enlightenment, in self interest or otherwise. They generally quickly devolve into craven animals.
Any decently designed datacentre that has an office build nearby, will have heat exchangers that, effectively, heat the offices for free. One place I worked changed from a cluster of ECL mainframes to CMOS beasties merely on the basis of saving £250k per year in power costs. Worked fine, until they had to re-instate the offices' boilers when it started to get chilly.
Anyhow, the issue that Intel raises is more about getting the chilled air to where it's needed: at the processors and disks - not the surrounding environment. Maybe a better approach would be to address the ventilation issues with high-density servers, blades and disk arrays, then the rest of the datacentre space can be at any temperature they like.
I've had both AC failures and sudden expansion.
The AC failure was due to power maintenance work taking place beside the server room, that in no way would impact the server room. Guess what, they shut down the AC, five units became 1 unit. Routers were the first to go, then the servers at head height. It took three hours though to get to that point, so thermal inertia is good.
Worse case scenario is that there is a hush hush project coming that will add to your server room. Except they've checked you've got the space - by simply looking in. Finding out that a 25% increase in servers are arriving and that you have to hook them up ASAP isn't good, being told "I didn't budget for cooling" is worse and that the lead time to get cooling in, is 6 weeks is worst of all.
In the event of failure of either a server's fans or the aircon itself the server or room will very quickly approach a critical level if what Intel suggests was allowed. Keeping the room cool (21C) buys us extra time to fix problems with fans and aircon, as Anonymous Coward suggests.
It's amazing that Intel didn't mention this - or did El Reg not report on that?
People have tried that. Actually they ran servers at 48v DC, because that's the standard used by phone companies for their kit. It was a business failure.
(It's a "special" requirement on your servers that locks you into a limited number of vendors and a limited number of server configurations).
(48v was chosen because higher voltages mean lower current, which means thinner wires, which means cheaper wires - copper is expensive. It's also the legal "safe" limit, so you don't need to worry so much about preventing people getting electric shocks).
Servers racks running from 24V would need to be connected not with plug-in jacks, but with copper bars bolted together. Watts = V x A and wire losses are I x R^2. Cutting the voltage to 10% would make the current 10x higher and make the losses 10x higher even when 10x copper is used.
The losses in switching power supplies is mostly proportional to the current so most of the losses are on the low voltage side, not the high voltage side.
Sure the power bill is higher, but the accountants will happily pa it, now, if we break servers say, 2 years earlier, I'm not sure the chief of IT will be too pleased.
But as i don't want to get told to "merge" 2 servers until we get the replacement budget... I rather stick with bringing a little discomfort to polar bears, thank you very much.
Mine's the one with polar bear fur inside, to stay warm in the server room.
>One thing no one talks about when running "hot" is the total
>lack of safety margin for cooling failures.
The thing is 20-25 degc isn't hot. Have you ever actually seen a machine get hot enough that it fails? Nope. Modern machines are pretty good at protecting themselves from thermal meltdown. In IT you seem to have people that like to tell stories (once we had a machine get so hot it burst on fire blah blah) and get alarmed about things. Most semiconductor devices can withstand temperatures that would have data centre admins running around screaming about the end of the earth.
>We run too cold for one reason: It buys us time in the event of a
>cooling failure to either fix the issue or get shut down gracefully.
Read; Gives admins that don't understand much past putting machines into holes and plugging wires something to do, and make up stories about.
>Not the best tradeoff, but our facility doesn't have the capability for
>redundant systems so we need whatever time we can get in case
>of a failure.
Your machines don't have any of their own thermal management?
Modbus, eh? Must be a really difficult protocol if you can't make it talk to a PC.
Utter tosh of course.
Modbus is about as simple as you can get and dates back to the days when paper tape was a viable offline storage medium, in fact Modicon (owners/developers of the Modbus protocol used by their programmable logic controllers) used to offer a service of uploading the program from your controller and saving it to paper tape which they'd keep safe for you. For a fee, obviously. Storage in the cloud and all that, circa 197x.
I keep the A/C at 16 to 18°C simply because it keeps the the bosses and PFY out of the machine room. Another thing we have to consider is that it does get too hot for the air conditioners to function. When the outdoor unit gets up to about 55°C the air conditioners capacity is about half until the safeties kick in and the cooling capacity goes to zero.
I'm not sure about servers but my PC and laptop act as great room heaters so I save on the room heating! Just make the whole building a giant heat sync from the servers and boom... 2 problems solved as once. I will have my check now please.
Fire because of said laptop when the extractor fan is accidental covered.
Yeah, as a matter of fact I have seen a machine or two get hot enough that onboard thermal shutdown takes it offline - abruptly - with no regard to any sort of application-consistent shutdown. Lost more than a few drives to overheat too.
You may think we run cool because we don't know any better. Right... Apparently your equipment is special and doesn't mind the heat. Last time I checked, when a server room got above 45c for any length of time bad things started happening. In a small room with a few racks of gear (and no one on-site full-time to even open a door if needed) that happens damned fast. But hey - run your gear as hot as you like. I'm sure management will be thrilled with all the money you save and will be happy to share the savings with you.
If you look in your hardware manual, you'll find most hardware is rated at between 10C-45C or thereabouts, so 23C falls closer to the middle of the correct temperature range than, say, 16C.
We've had several machine rooms that ran consistently hot (anything from 28C to 35C), mostly because management did not want to pay for proper aircon, and in practice have found that failure rates are just not hit that hard. Not hard enough, at any rate, for reduced failure rate to cover the cost of increased aircon.
On the other hand, we mostly use fairly cheap hardware - if we were forking out $20k per box we might think differently about risking server failures.
Back in the old days, we had a small bunch of our company servers sat under a bench in an unventilated room with a SW-facing window. In summer (this is UK, so not like real summer) the room was too hot to enter, but we never had any downtime.
Some of those machines are still with us, but in a 20-degree C server room. They are still running just fine.
This company is small enough that saving money on AC *will* have an effect on my salary (or at least, won't make it any worse...)
UPSs really don't like elevated temperatures, it measurably shortens the life of the batteries.
really big datacentres have a separate power room with UPSs and of course a generator outside; small computer rooms in offices usually have their UPSs in the same room; these rooms usually don't have carefully designed air-flow, so maintained an *even* temp is much harder.
so, in a proper datacentre with proper air-flow I'd be happier to tolerate 25C at *all* the equipment's air inlet - i.e. no shot-spots. in a converted office computer room I'd run cooler, say 21-ish, so help reduce problems caused by poor air mixing.
As has been pointed out already, 48v DC powered servers are available (mostly for the telco industry). Incidentally, 48v feeds does not eliminate the need for switch-mode power supplies - virtually none of the electronics in a server actually runs directly at that voltage, and stepping down from 48v DC to the required levels isn't really much more efficient than stepping down from 240v AC.
However, there is a big potential saving in using DC-powered data centres, and that is because there is currently a lot of wastage in the area of non-interruptible power supplies. Most current practice has the inbound line power from the electricity supply (and from standby generators) being stepped down to battery levels and then stepped back up again to 240V AC. That double conversions is wasteful. Supply your servers via 48v DC feeds and you can avoid that step-up phase.
However, there is a fundamental problem of physics which makes supplying a data centre at 48v wasteful. As any schoolboy will tell you, to deliver the same power at 48V DC as you do at 240V AC (RMS into a resistive load) then you are delivering 5 times the current. As power loss in wiring goes to the square of current, then if you want to keep your power loss in the data centre wiring to the same level the cross-section of the wiring would have to be 25 times higher. Wire is actually impractical in those cases - you have to use very thick and heavy copper bus bars. Start equipping a 2,000 square metre data hall with that sort of thing and you have a major problem.
It is, of course, possible to run higher DC (more batteries in series) to avoid this, and still get the benefit of feeding your data centre at the non-interruptible DC battery levels, but 240v DC is extremely dangerous stuff (mostly a 240V AC shock will not prove fatal unless you are in poor health or are unfortunate). However, high voltage DC has the very dangerous feature that it tends to lock the muscles to the conductor whereas AC tends to make the muscles spasm and release.
Another approach is to provide each aisle of servers with their own non-interruptible power supply (ie batteries) and step-down from the mains with 48v feeds to local servers, but that means putting a lot of lead-acid batteries into you data hall with you servers, not a great place to put something that needs maintenance and can produce nasty, corrosive fumes.
The issue of available DC-powered servers isn't so much of an issue. If there was the demand for it, then it is a trivial matter to equip a server or blade rack with a DC-DC power supply rather than the normal 240v AC-DC version.
Did no one read the piece that Google put out six months ago? They actually did this empirically: ran one datacenter at traditional temperatures, and another that was quite warm (don't recall the figures off-hand) Guess what? the hot one had fewer failures.
Shame this wasn't mentioned in the article.
... of running the server room warmer. At work, we don't have AC in the server room - the landlord won't allow the outdoor unit. Instead we just have forced airflow (which needs upgrading at the moment). OK, it's not a big datacentre, we only run at about 7.5kW.
What I can say is that the server fans run up when the supply temperature goes into the low 20's (˚C that is), and by the time the temperature is up to 25˚C then the power consumption is up by about 5%. If you find the detailed specs (such as in Dell's rack configurator) then you'll find that the power consumption for a server is known to increase with temperature.
Unfortunately, due to being 'on the limit' with the cooling, when the servers run their fans up, the result is that they suck more air than the ventilation can provide and hots air leaks in from the hot side - this increases the temperature and makes the server fans run even faster. We were a bit nervous when we reached a peak of 31.7˚ at the end of June - but nothing crashed (and no-one ventured into the hot end unless they really had to - 45.5˚)
Also, has no-one read Google's report on disk drive reliability - IIRC they determined that temperature did not have any significant effect on failure rate.
Very interesting article that appears to validate ASHRAE guidelines of 80F data centers (or even higher). The energy savings in the article are largely focused on reduced server and CRAC fan power, which is a significant energy saving. Operating central water chillers with a higher leaving chilled water temperature also reduces data center energy by increasing the chiller COP. By far the largest potential energy saving in data centers, related to all of above, is the use of Free-Cooling chillers in central & northern USA, Canada & Europe. Using the outside ambient air temperature to provide partial or total data center cooling is a growing trend in large data centers, and also influences their geographical location. The governing design factor is the number of partial or full free cooling hours available at any location. Simply elevating the chilled water temperature to achieve 80F (or higher) in the conditioned space creates significantly more free cooling hours (when ambient is lower than chilled water temperature) in every geographical location while at the same time maximizing the COP of the chiller(s). This energy-efficient design offers all the benefits of direct air economizers with none of the associated problems of security, filtration, contamination and humidity control.