eh?
Even my crappy 5 year old pc at home shuts down when it gets too hot preventing it from getting damaged. How the hell did they get the server room so hot it fried the equipment?
An investigation has been launched at Leeds' famous St James' hospital after a server room disastrously overheated, permanently frying a new computer system for storing patient x-rays. St James, known as Jimmy's locally, is run by Leeds Teaching Hospitals NHS Trust. It confirmed that early assessments indicate damage has …
What kind of hardware doesn't bring on board temp sensors and self protection systems that shutdown the system (gracefully or otherwise) to prevent permanent damage?
Unless of course these geniuses disabled them because they were shutting down all the time.
I remember installing a Plessey digital PBX exchange in Croydon Council way back when they were 'new'. Someone decided to put the air-con on full as the room felt a bit warm. The next morning, all the air-cons were covered in snow, preventing any air cooling & the temperature was over 50C. The switch didn't crash though.
A lot of the servers we use don't shut down when they overheat, but then we have monitoring and alerts if there are overheats as clearly there is something wrong when it does happen. Usually a case of broken aircon. It's quite possible that the incoming airflow was not being cooled (enough/at all) and when the warmer air was being passed through the drive arrays (usually quite hot babies) they weren't cooling them enough so they continued to heat further and further. A couple of disk arrays cooking themselves will do some severe harm to this kind of system I should imagine.
Well NHS IT is speced to have extra fans and cheap power supplies...
Now the whole lot will have to be scrapped.
I know of at least one make of mobo that has just the one fan controller to drive the Cpu fans even though it is advertised as dual cpu for redundancy! anyway the box gets hot the fans spin faster it draws too puch power and piff paff poof the magic smoke is released, and neither cpu fan works, only then do they hit overtemp and shutdown... D'oh.
First you build the server room, and make sure its structurally sound.
Then, you install the racks and servers. Turn them all on, make sure its all working. Let it run for a little while to see that everything is OK.
Then, as the coup de grace, you install the air conditioning. This makes the servers more comfortable.
Especially in health care, you have to take pains to do everything in the above order, or else madness might ensue.
I just hope this does not happen any more, some of these medical records can help save peoples lives. We need to move away from these hot servers with high Ghz CPU's and larger and larger disk stores that have become more prevalent in the last 5yrs.
These power and cooling requirements are not sustainable. We could be heading for more than a meltdown as seen in Jimmy's.
See: http://blogs.sun.com/ValdisFilks/entry/fighting_fire_with_fire
We need to be responsble IT people and make balanced systems from a power and cooling perspective.
Our air con in our server room at Notts City Hospital packed up a few weeks ago. Our entire building was stripped of desk fans to keep the room cool whilst the air con was repaired.
Quite funny actually, for me anyway, being a low level tech... The system admin guys didnt find it quite as hilarious
I remember a nice install of 48 v480's I did a few years back... After providing the power and thermal specs of the machines the company in question said "We'll look after that".
After replacing nearly all of the pdu's which had blown while firing up the servers they were finally in a position to fire up enough to make the aircon units literally "sweat" all over the racks... we had to throw plastic tarp sheeting over the racks just to keep the water off, and you can imagine what that did to the airflow :D
laugh? we nearly cried!
Our local college built a new extension, and put in lots of "nice" RM PCs, 24 of them in a horse-shoe, as the room was small these PCs (desktop cases) were nearly touching, they all had a fan on one side & a vent on the other, see where I'm going here?
Yes the PC on the far right wasn't very stable...
I work for the NHS doing 2nd line desktop support based at The Causeway in West Sussex.
I visit the server rooms hereon a daily basis to do various tasks, backups, patching ports etc.
All the server rooms I have had to visit so far have had very efficient cooling, most of them at least 3 air conditioning units. Although they do run a few servers with the covers off so air flow on those particular ones probably isn't too great, but it's within safe levels.
In our office we often joke about our colleagues being cowboys for whatever reason but we do a job and we do it well. Do the guy's at St. James's actually employ professional cowboys? It makes you wonder.
... to my boss... We have several racks hosted by a hosting company.
The server room gets so hot, due to bad aircon, that all us techie know to wear lightweight clothing, and the thinest t-shirts we can find when we go there.
We've told the boss... Many many time, but he didn't take any notice...
Not until the day 2 drives on the raid 5 database server died at the same time. Would have had a chance if the hotswap had had a chance to build up, but they went literally minutes apart. That's what I call tight manufacturing tollerences!
I worked at a huge Facilities Management centre which at one time allegedly had the biggest machine room in Europe. The air handling units were real monsters. They balanced the room by borrowing a thousand or so electric fires direct from the manufacturer.
The AHUs were pretty reliable but when they went titsup there wasn't much time to bring the machines down. Security was breached once when we had to open every door on a Winter's night and let the rain in. That was fun.
Working in IT in a university I've seen this in action a few times. We moved into a new build a couple of years ago with our office next door to a machine room. Not only had the bean counters refused an extra £50k to flood-wire the multi-million pound building (surprise surprise causing problems within months of the building opening) they also only forked out for two under-specced air-con units for the new machine room. So when one failed...portable air-con blowing hot air into the corridor making it kinda tropical for a couple of weeks while waiting for the repairs.
Of course they wouldn't do that in the new multi-million pound build that opened this summer. Surely. As if. No. Oh yes - another new machine room reliant on two air-con units, each of which cannot cope on it's own. For an extra £7.5k we could have had three units of which two would have been capable of cooling the machine room in the event of failure.
Who knows what the cause was in this case but the bean counters have to be a fair bet.
Ian
From the description, this was meant to be a data storage unit, therefore in a *sane* world it wouldn't require multi-gigahertz-multi-core CPUs just to get the damn "operating" system booted. The required time to retrieve an x-ray image isn't likely to be measured in milliseconds, so stupidly fast (and therefore hot) HDDs wouldn't be needed either.
Tape backup units, of course, now they would up the price considerably... wouldn't want to deal with the shopping trolley of backup tapes that would be required for a full backup of such a system.
It really depresses me that we're paying for this kind of dumb-ass improperly specced, badly implemented (doubtless late) "solution" with our taxes.
I was thinking the same thing myself. I regularly by Dell MD1000's with 7.5tb of storage (15 x 500gb SATA drives), these are used to record surveillance cameras, so are reasonably speedy, and cost about £3250 a piece. The server to control 6 of these beasts weighs in at about £3k.
So we've got just short of 50tb of storage for £22,500. Lose a third for RAID 5 and you've still got storage for about £600 a terabyte,
So what exactly was in that room, a huge stack of gold plated SAS drives?
******* From the description, this was meant to be a data storage unit, therefore in a *sane* world it wouldn't require multi-gigahertz-multi-core CPUs just to get the damn "operating" system booted. The required time to retrieve an x-ray image isn't likely to be measured in milliseconds, so stupidly fast (and therefore hot) HDDs wouldn't be needed either. *********
Bit of background: Radiology images are stored in DICOM format, these are large! A plain film single XRay will be up to 10Mb in size, new CT and MR scanners are putting out image sets that can easily be 1Gb in size. While the radiologist producing a report has to see the full, uncompressed image, most clinicians in outpatient departments and wards don't require this level of detail as they are viewing the image in conjuction with a clinical report. As the time taken to throw gigabyte files around the (in some areas) aging 10Mbit network is unacceptable, the images are converted to JPEG before viewing on a workstation. This is done in real time as the images are requested and that does take Multi-Ghz, Multi-core CPU's.
Also, these images, in our system, are stored on a shared SAN (cache) / CAS (archive) system that many other hospital systems use. Therefore the disks are specced to the highest requirement of those systems.
Having said that, if you install a fast hot server, whatever it's for, you must provide the correct environment for it! I would guess that the computer room in question also held Exchange servers, hospital information system servers etc, but that really doesn't make the headline as interesting, does it!
PS - While the implementation of PACS as an (eventually) nationwide system is new, the idea isn't. Leicester General Hospital has had a fully functioning PACS system since 1999 :-)
...How long it would be before the BofH came under suspicion.
When the AC failed in my very modest server set up (two or three RS6000s, a Novel server, firewall, switches, routers, pabx and stuff, but packed into a fairly small space) the chief bean counter wondered, for days, if the thing was really necessary!
In fiction, people resign. In real life we have to eat --- but I did write the MD to the effect that I refused to be held responsible for the company's data until this was fixed. It worked too!
but due to the official secrets act I can't discuss what government agency I work for but let me just say that they can't get the air conditioning working propley for people let alone computers, the other week we spent days working in swealtering, boiling hot conditions while they "found a spare part" for the air con.
If the UK government can't keep the air con working within legal "comfortable" limits, I dread to think what their servers like.
Building a server room for a friend. Has 30 dell 2u boxes. rule of thumb: they draw 250 watts on power up, 100 to run.
250*30=7,500 watts to power up after an outage, divided by 110 (it's the us...)= 68amps. Can probably do it on 3 20 amp circuits, but will use 4 for safety.
100*30=3,000 watts continuous power use. He bought an air con that removes 1000 watts. I asked him where the other 2,000 watts were supposed to go, got a blank look.
I suppose he'll figure it out, or I will have a nice, warm place to go in the winter, at least until the servers have shat themselves. ;-)
All goes to show, it isn't rocket science here -- just too many managers and too little thinking.
from my experience the Trusts have they're own server guys and get contractors to roll out the desktops + UBER LCDs for viewing the PACs supplied xrays....
So whats new government employees / dead wood who just coast and actually know nothing about what they're doing, they'll never get sacked unless they commit murder... even burning out million quids worth of equipment won't shift em
NHS Trust IT Staff are no better then tit of life sucking Civil Serpents if you ask me
Probably by the end of next year, over 80% of the images shot in the US will be digital. Films are going the way of the dinosaur. And the quality of the images is increasing at a very high rate. Which means that the bit density is increasing, which means that storage requirements are going up. So those 6 TB (with RAID5) boxes will rapidly become too small. Medical imaging systems are about to go through a very, very painful stage.
Then there are those billions of films still out there. HIPPA requires that they be kept for 7 years or, for minor, for 7 years after they reach the age of majority.
...... is rocket science. I've lost count of the number of server rooms that I have built (the last for the largest single-occupancy building in Europe). The "science" behind a proper and secure environmental build is as basic as it can get. All thats needed is to take account of the heat output specs and design around that. It does not matter *what* it is, just how much heat it can generate. For example. take account for, or don't assemble in the first place, any *hot spots* in the room, account for external forces (path of the sun on the building if glass walled), allow for redundancy, that kind of stuff and Robert is your father's brother.
That's the technical bit done. Next up is keep the Finance Director and his team in their cage. They are not the experts in this stuff and should not be allowed to dictate the design. Engage the MD or CEO at the outset and present artificial financial constraints as a Key Risk in the project. If he doesn't understand that the project is pretty much in trouble from the off.
A properly run project would not allow stupid disasters like Jimmy's to occur. But then what Government project has ever been "properly" run? Ineptitude and corruption abound, the results are entirely predicatable.
Accidents happen, this was an accident. We should stop bashing the organisations involved, the NHS is there to help us we should help them. I think we all owe the NHS a lot, I have had 2 broken arms fixed by them.
This is an opportunity to start thinking about data and cooling.
We can be smarter in the future and store the PACS data on low temperature storage e.g. tape. We need a smarter storage hierarchy were data is on disk for x days then goes to tape for months or years. No point paying for cooling spinning disk that no-one uses.
I know this is a shameless plug, but we have a spanner/tool that fits the job, people may just not know about it. SAM makes the media transparent and your applications things all data is on disk. You can take a copy and send it to tape on another site which does not need the cooling of disk. Disk is good for short term storage or active data, tape is good for inactive data. SAM glues it all together so you never know.
The solution is here: http://www.sun.com/storagetek/management_software/data_management/sam/index.xml
An example is here:
http://www.serverwatch.com/hreviews/article.php/3696256
This post has been deleted by its author
We're setting up a new server room in our basement. It's partly naturally ventilated but tends to be cooler than the ambient temperature outside.
Our MD doesn't want to install *any* aircon, just have the systems in a room with the doors open and a fan blowing. Outside temperatures sometimes reach 40C here.
In another server room it's in an airconditioned office without dedicated aircon (just the general office aircon) and is kept "cool" (~30C) by leaving the door open. Recently we've had complaints about noise & heat by those sitting near the open door. His response was to buy a floor-standing portable aircon unit and ask us to set it up to vent to the office corridor and close the door...
This for a room with around 35 systems in it - call it 5000 watts or so power draw.
Fortunately we've had a sanity check.
I used to work in a Boots store just after leaving Uni. For some reason the server for the store had been put in an office (Or the office had been put in the server room). People would walk in there to work and find it was too cold, so switch the AC off, then complain about how the system was so slow...
Also, I would like to point out that Bean Counters, dispite what people think, never have a say in what is spent. They will tell bosses how much there is to spend (I.E. how much is in the bank) and then leave them to it.
is that your aircon status panel cannot fail in a condition you cannot spot.
worked in a place with 5 Vaxes in a room, spring came, i thought it was getting a bit warm especially behind the 8650. told sys admin, he looks at the aircon panel, no failure lights were showing (3 separate aircon units)
anyway 1st really warm day of the year, cant log in to anything, mosey over to the server room, the loading doors were wide open and the sys admin was going around spinning down the disks. the console printers were going mental as each machine lost then remade contact with each other (and it would report every other machine in the cluster doing the same thing)... eventually full power down.
i was right, it was getting warm. basically 2 of the units had bust but the panel was bust too, so the failure lights didnt show. on the 1st really warm day, the final working unit found it all too much and gave up the ghost....
A few years ago.. I implemented a software system in a new state-of-the-art web-hosting centre for a global leader. The centre was build in a former print-works in London Docklands and the design & decoration of the foyer alone cost a £1 million. The only problem was that (being a former print-works) there were not as many solid floors as the Data Centre designers expected.. and when lots of hole were made for cabling the whole place became much more porous.. froze the offices and baked the servers.
A couple of years ago, I worked at an Investment Bank that needed to open the data-centre doors and use fans to blow hot air into the car-park shared with the Investment Bank next door.
Lets put aside the (very) amateur carping and remember that there is nothing more mission critical than life & death. £1 million is less than the compensation claims if some peoples children died because “the server shutdown because it thought it was hot” or “the courier with the tape of your patient records has been delayed”
Spare a thought when you criticize the CAS on-line disk archive (that power-down when not used) that there are people wandering around with fried-nuts and fried-ovaries because the only “on-line” store was to do another x-ray.
Cut them (I’m not involved with NpfIT) some slack, with the latest hot-boxes cooling is a big problem.. the only difference is that they are under the spotlight
I've been in server rooms that were so hot, the servers kept shutting down. Air conditioning had been installed, but appeared not to be functioning correctly. The reason? The aircon was pushing cold air into the room, but there was no ventilation for the pressure to escape. So, aircon was trying to push cold air into a closed room, the pressure inside the room built up, so no more air could be pushed in, resulting in aircon not doing anything except wasting electricity, and servers overheating. How many server rooms are incorrectly set up like this? Could this have been the case at Jimmys?
Stick core IT systems e.g. things that really do mean life and death with third parties that specialise in data centres rather than keeping your servers where you can see them and get on with looking after the patients.
It might have been fun planning and building a spare room into a "data centre", but its an empire that delivers zero value to the hospitals customers.
Errrr, I'm not sure you have got to grips with how AC works, but its not "pushing" air in through those lagged copper pipes! Its cooling air that stays in the room by circulating it, no pressure build up! The pipes contain the heat laden refrigerant, literally transporting just the heat out of the room.
Are you one of those people (usually women, sorry for the sexism, it's just an observation!) that says "Oooh it's hot in here, turn on the AC and open a window?"
AC Works best in a sealed environment. Now if you were talking about fans extracting air, then you need a neutral pressure inlet that allows air to flow in at the rate it's pulled out to avoid a negative pressure build up as the fan tries to suck all the air from the room.
While I don't work for the NHS *directly* my employer has a pretty close connection with them and all their policies, and from first hand experience I can honestly say the people making the decisions and running the IT are nothing more than cowboys with little/no solid real world experience in the massive projects they're expected to lead. If they're not technical wannabe's they're doctors/professors/business graduates with little or no experience (anyone can learn a script and repeat it parrot-fashion... Head of IT demonstrated this skill very well in a recent meeting when I fired a few baited questions to test his knowledge. Neither he, nor my boss caught on while two equally knowledge colleagues stifled laughter for several minutes - whiteboard diagrams, buzzwords, corpspeak... you name it, he did it!).
Better still, these cowboys are regularly unavailable for meetings etc as they're away on training courses or arguing with suppliers/consultants in a vain attempt to do things their way. There will be changes to the IT structure here in the next week or so and I can only begin imagine the even bigger influx of disgruntled users who can't access email because their region has to share a 1 or 2Mb ADSL line with anything between 50-100+ user while the The Ranch (ie, head office where all the cowboys are) play with their overspecced & overbudget custom built PCs and 30-40Mb+ lease line. One of the few clued up guys has handed in his notice today for most of the reason above, and there are other people thinking about going the same way.
Makes me sick to think my tax £££ are being wasted on such a joke.
"100*30=3,000 watts continuous power use. He bought an air con that removes 1000 watts. I asked him where the other 2,000 watts were supposed to go, got a blank look."
Will a single 60 watt lightbulb in a room with no aircon will eventually be hotter than the surface of the sun? would extracting 3001 watts from a room using 3000 hit absolute zero sometime?
OK 3000 v's 1000 seems obvious and insufficient, and good for fag packet* maths, but don't forget conductivity and heat capacity of the walls/floors/ceilings, solar gain through any windows, airflow, pressure pockets, insulation etc. it's not rocket science but it is engineering science.
I've seen a server room (a real one, that could remove more heat than the servers could ever produce) cook because the aircon cooled pockets of the room and when a server caught fire the airflow was so badly engineered that the smoke detectors only ever had clean, cool air blown over them.
A 3Kw may not be good enough by itself, conversely 1Kw may be fine, for a hobbyist, just ramp it up to 5Kw and you'll probably be OK, probably, without calcuating for all these other factors you really don't know, an extra $300 for a higher 5Kw "overspec'd" aircon won't be an issue, but you can't do the same thing on the enterprise scale. Perhaps somebody made this exact mistake, someone who thought it was simple, was working to a budget, added up the power usage and thought that would be good enough?
*fag packet, for US readers may not mean what you think it does.
By Oliverh
Posted Friday 28th September 2007 13:34 GMT
***Errrr, I'm not sure you have got to grips with how AC works, but its not "pushing" air in through those lagged copper pipes! Its cooling air that stays in the room by circulating it, no pressure build up! The pipes contain the heat laden refrigerant, literally transporting just the heat out of the room.***
I do know quite a bit about the way aircon works (it was part of my job for a long time). In many places, they have an "indirect" air conditioning system, where the aircon unit is fitted to an outside wall or on the roof. It takes air from outside the building and cools it. Then, there are conduits taken to where the cold air is needed. Fans push the cold air from the aircon to the required rooms. That is the sort of aircon I'm talking about. This includes virtually all supermarkets, hospitals, and large office blocks. If the room has no venting, you get a pressure buildup, as I described.
***Are you one of those people (usually women, sorry for the sexism, it's just an observation!) that says "Oooh it's hot in here, turn on the AC and open a window?..."***
Erm, no, I'm not one of those people. Having installed many heating and cooling systems in commercial establishments over the last 30 years, I do realise that a "direct" aircon system works best in a sealed environment (as in my car), but "indirect" systems don't work if the room is sealed. You can only push a limited amount of air into a sealed room before the pressure reaches equilibrium, then the airflow stops.
Server rooms are becoming really pretty difficult. Without knowing the exact details it is virtually impossible to know what went wrong. And it is harder to get right than many people realise, and vastly more costly too. It is fast coming to the point where to create from scratch an enterprise quality machine room, the price is going to be of a similar order to the cost of the equipment housed.
You can get a single rack of gear that dissipates 32kW. That is extreme, and is supplied with specialist cooling for the rack, but even less energy dense server systems can very quickly turn into power hungry monsters.
There is a huge and costly difference between "comfort" air and "data centre" air. A data centre system must be capable of running 24x7x52, and runs close to capacity all the time. It also provides humidity control. And is n+1 redundant. Trying to cool a computer room with a system designed to cool human beings is a very bad idea. Such systems are simply not designed to do it - with interesting design issues that cause premature failure if they are pressed into service. For instance a scroll compressor is incapable of coping with liquid refrigerant - it shatters the insides if it ingests fluid. So on a very cold morning, such an air-conditioner may fire up as the machine room comes on line, and starts to dissipate more heat. But things are still cool - and there is not enough heat in the machine room to evaporate all the refrigerant before it reaches the compressor. Exit compressor. After all - the designer of the air-con could hardly imagine that anyone would run it when it is freezing outside and only 20 degrees in the room. I know this one. The system in question has had four new compressors fitted so far.
We have just commissioned a new room for our computational clusters. 60kVA of UPS and 45 kW of (redundant) air-con heat rejection. Cost from starting with a bare concrete shell? 225,000 pounds. Now you get to spend money on servers.
Having done my time nursing rooms full of cooking supercomputers with old and cranky air-con failing in the early hours of the morning - and coped with the drastic drop in reliability of disks and systems in general that goes with it, there really is no other way. But finding the money is often insanely hard. In so many areas the money is typed. There is money for toys, but no money for infrastructure. In many businesses the money to lease kit is easy (spread out, tax effective), but money for capital works is impossible (pay now, no tax breaks).
Who the hell thought that was a good acronym? Sounds like someone trying to spit out a fly. NPIT would have been a) pronounceable b) readable and c) more accurate. I can just imagine the (endless) meetings that decided finally on sticking a stupid lowercase f in there. " Oh God, we can't use NPIT - think of all the jokes people will make" - maybe they should have thought of that *before* thinking up yet another dull name. Not joined-up, but definitely consistent.
As someone who knows a more than little about PACS, it is entertaining to read this thread, but first to put some details in place:
1) Yes, there is a bungling intermediate company - CSC (Computer Sciences Corporation), one of the 4 (together with Accenture, BT & Fujitsu) are overcharging NPfIT/CfH by a factor of about 4 for the English PACS instalations
2) Yes, PACS does need a lot of storage, and it is rarely used after the inital few days, so tape would seem like a good idea, but jukeboxes are hugely expensive and unreliable, so disk has taken over. That said, anyone using SAN (or even worse, that proprietary, expensive mess called CAS) for PACS data must have a corrupt financial reason for doing so - we shouldn't be paying more than £1k/TByte (I'm told the NHS is paying > £20k/TByte for off-site storage!)
3) There are now systems (e.g. MAID from Copan - http://www.copansys.com/) which only power the disks up when needed - avoiding power and cooling costs which really should be considered for this type of application (I don't work for these people - I just like the concept - others may do the same or perhaps even better).
So now we know that, when Tony the Blair-witch said that his government poured so many *EXTRA* billions into the NHS, this is what happened to those billions !!
ALL reports to date indicate that the patients are *NOT* getting any benefit from all that extra dosh !! From all I've heard, we'll have got far better value-for-money just buying a large herd of ponies and getting guys to ride from hospital to hospital with saddle-bagsful of X-ray films; and it's environmentally friendly, too !!
Perhaps we should adopt a concept first postulated by Larry Niven (science fiction author), that any NHS administrator who specced a impossible system should be dismantled and his parts offered for free orgen transplant to the needy. Obviously, the brain will be the most valuable part - one previous owner, hardly ever used !!
I worked in an academic research section on the site and we have been advised by an insider that the air con didnt fail but that someone working in the room turned it off.
We havent been told if it was an IT related person or someone else who had access or was given access to carry out work.
Lets hope it wasnt the cleaner plugging in her hover!!