Even the Status page seems to be down.
Even the plane't own site seems to be down now as I can't access the status page link from this article either.
A fire at The Planet's H1 data center in Houston, Texas on Saturday has taken out thousands of websites. In messages posted on the web hosting firm's forum, the company blamed a faulty transformer for the fire. No servers or networking equipment were damaged, but the data centre remains without power, after The Planet shut …
To keep you up-to-date, here is the latest information about the outage in our H1 data center.
We expect to be able to provide initial power to parts of the H1 data center beginning at 5:00 p.m. CDT. At that time, we will begin testing and validating network and power systems, turning on air-conditioning systems and monitoring environmental conditions. We expect this testing to last approximately four hours.
Following this testing, we will begin to power-on customer servers in phases. These are approximate times, and as we know more, we will keep you apprised of the situation.
We will update you again around 2:30 p.m. this afternoon.
###################################
2:30pm CDT is about 8:30pm BST, so an hour and a half till the next update.
They also have said :
"We absolutely intend to live up to our SLA agreements, and we will proactively credit accounts once we understand full outage times. Right now, getting customers back online is the most critical."
Hmm , crap power supplier not adequately monitoring the grid delivery and sounds to me more likely the system mains power transformer has been running at 115% plus power overload margin for far too long !
Mind you ,I have seen and heard of at lest three go with a very spectacular bang in my area causing massive local power outages for hours on end !
My backup server in H1 is down. I noticed that it timed out last night when transferring from my primary servers. Damn lucky that all I use it for is backup and secondary DNS. I had a server down with 1&s****house1 last week I toyed with the idea of moving those server customers onto the backup server briefly before setting up a new server elsewhere. Glad I bit the bullet and set up a new server straight away instead. Even more lucky is the fact that my experience to date with The Planet has been so good that I nearly set up the new server with them.
I'm willing to bet that The Planet will get the entire Houston data center back on-line faster than 1&1 can get my server with them back on-line. But then thats not much of a bet the 1&1 server has been off-line for 14 days now! Muppets.
@Alan
It's the weekend. Come 10:32am Monday and if you're still off-line you'll know about it, that's how long it takes customers to realise that it's not their exchange server that's the problem.
I'm not a webhosting or datacentre guy, but I was under the impression that there would be procedures in place to guard against this sort of outage - offsite stuff, redundant sites, etc?
If that's the case, is redundancy what is going to happen to the DR guys? :-)
Alan - maybe no-one likes your websites and doesn't care? ;-)
Steven "Only joking Alan" Raith
According to http://www.thehostingnews.com/news-dedicated-server-firm-the-planet-data-center-manager-garners-award-4306.html it claims "Mr. Lowenberg conducted a six-month trial to reduce power consumption and increase data center operating efficiency. Initial results demonstrate that while critical server loads increased by 5 percent, power used for cooling decreased by 31 percent. Overall, the company experienced power reductions of up to 13.5 percent through a broad range of improvements. The new green initiatives were conducted across its six world-class data centers." and also The Planet operates more than 150 30-ton computer room air conditioning (CRAC) units across its six data centers. In one data center alone, the company was able to turn off four of the units. The cooling requirement on two of the units was reduced to 50 percent of capacity, while another nine now operate at 25 percent of capacity. The company also extended the return air plenums on all of its down-flow CRAC units to optimize efficiency. "
So if the Phorm/Webwise system was operational, does that mean BT Broadband system goes t*ts up?
Oh dear BT, this Phorm/Webwise system is going to do wonders for your customer satisfaction. ... NOT!
BT customers could be leaving in Droves!
(They possibly will anyhow once they get a handle on the invasive nature of Phorm/Webwise interception of all their HTTP traffic and find out the history of Phorm (121Media) and its nasty spyware products.
Paris, Because she loves a warm fire in her belly and she frequently goes t*ts up.
We host a site of medium-high importance. It has been our plan that as soon as we can financially afford to set up a duplicate rack in a totally different location we will. Both sites will be load balanced and data replicated in real time. It adds at least 150% to the cost but if you need that level of resiliance you have to pay for it.
Data centres are better than hosting in your office but are vulnerable to outages as many people know well. Our last outage was because some idiot (a data centre engineer?) switched off power thinking it was only going to affect someone else's rack. They may have fire supressing gas and the best security system, but there is no technology to prevent the employment of idiots. And there will always be idiots.
"I once had a tour of a data centre in the UK and was shocked to find their fire extinguisher system was "water sprinklers""
A dry-pipe water system, Vesda particulate detector, and continuous staffing are the usual approach.
The Vesda system sets off an alarm and a tech with a fire extinguisher goes hunting for smoke. This allows the usual sort of computer-based fire to be handled with little damage to surrounding servers (usually they just get the power dropped as the tech drops the rack's circuits prior to removing the smoking gear, taking it outside, then opening the box and applying the extinguisher).
The water system is for the last resort, usually from a fire in another part of the building reaching the computer room. It's not unreasonable for the insurance company to sacrifice the computer room if that saves the building -- anyway, they are paying for the damage to both so it's their call.
Gas got unpopular when CPUs got small, numerous and hot and computer rooms got very, very large. If you think through the consequences of a cooling gas hitting a modern hot CPU and the problems of venting released gas from a large space you'll see the problems.
Fixed powder-based systems aren't a good fit to computers. An aerosol-based system would be a better fit.
Despite the double entendre in the title, that with which I need help is replication of a MySQL database over two servers at differing locations. I have read the MySQL manuals, but I would appreciate a pointer to a tutorial or a book which explains the procedures in more detail. Currently I am working with a 1 gigabyte DB, and I would like to mirror or replicate it so I don't lose everything next time a server-farm disappears...
I think you guys are missing the point, it's not about colocation or backups or power or any of that shite. It's about b3ta. What if there's no backup to the b3ta archive? It'd be like the library of Alexandria over again. 5:30AM on Monday and still nothing. I'm not a religious man but here goes.. Allah wu Akbar, Allah the digital, the compassionate please restore the purple cock and domo.
"Allah wu Akbar, Allah the digital, the compassionate please restore the purple cock and domo." .... By Seán Posted Monday 2nd June 2008 04:27 GMT
Amen and Hallelujah to That Passionate Restore Point of Immaculate Imperfect Relevance, Seán.
Love ur dDutch. .... Real Get SMARTer IntelAIgents.
Here's a Virtual IntelAIgents Swap Shop/Treasure Vault ....... http://www.ams-ix.net/
I'll get my coat ....there's a CAB AI Called.
Work's blocking the Internet Archive's Wayback Machine; I can't even see if they've got older versions of B3TA squirrelled away anywhere!
This is ridiculous. Surely The Planet have backups, disaster recovery, that sort of thing?! Can you imagine if the Emergency Services said "Well we can't actually man the 999 phonelines 24/7/365. We'll need a week off every so often but we'll compensate anyone financially suffering from our unavailability"? Well this is even more serious! 9.15 and B3TA is still down, people!
I was 3/4 the way through coding a P2P message board that replicated the B3ta messageboard and would have mitiagted this, but then I gave up through lack of interest. http://sourceforge.net/projects/b3ta
Will they never listen, think of the children, apologies for length or lack of.
An explosion of the local utility transformer took Rackshack's main DC off grid for 4 days a few years ago. Not a minute of downtime was experienced by 17,000 servers.
The subsequent write-up of the event showed both an amazing amount of pre-planning that initially kept everything going and fast adaption to cope with unexpected consequences to keep it going. A long list of lessons were learnt at Rackshack. Were these all passed on to The Planet when it acquired them?
And anyone who has a mission critical server without a geographical seperate backup - presumably doesn't understand the concept of backup - or why you have a minimum of two DNS. When those phones start ringing I hope they say "You are fired!". Putting client's businesses at risk (like no email?) is just darn unethical as well as bad business.
O look forward to hearing any excuses ... from £60/month for a deicated server phrases like a pennyworth of tar come to mind.
It wouldn't take too much for a co-operative to be set up distributing activities between a predetermined number of other hosts until the crisis is over.
Incidentally would I have a legal case against b3ta for making actually have to do some worth on a Monday morning and the mental anguish caused by this?
"Allah wu Akbar, Allah the digital, the compassionate please restore the purple cock and domo." .... By Seán Posted Monday 2nd June 2008 04:27 GMT
Ensha Allah.
Or, in layman's terms: the computers were built thanks to Allah, the data was put there by the hand of Allah, the colocation duplication systems were denied by the mighty will of Allah, the fire was started by the great and merciful Allah and the DNS servers are still down thanks to the estemed and bountiful Allah. Allah be praised - and the rest of us thank fuck it wasn't organised by LizardGov.uk otherwise the data centre would probably still only be half built, at half the original spec for quadruple the cost.
Can we go to stoning now?
Well I guess thats why we don't have our power transformers indoors then!
When they blow up they can do it peacefully in the car park while the UPS kicks in and prepares the generators for taking the load. When they kick in you see a mushroom cloud of diesel smoke, god knows what people think has happened when they see it!
I guess this is just a bad luck story, I can see they are working hard to repair this and get them back online. Would you like to be the one to reboot 9000 servers lol.
Definitely think they could have had a better disaster recovery plan in place. Seems like they only had a basic one and thats it....
The way to make it most resilient is to have two buildings kind of co-located (same business park etc) but not physically adjoined. So if one gets nuked the other can continue.
On the bright side I think they must have saved a bit on the leccy bill.... oops wheres me coat
If our El Reg moderatrix will permit it (pretty please, Sarah), may I invite Dr Trevor Marshall and other interested parties to join us for discussions at:
http://www.opensolaris.org/os/community/ha-clusters/
and/or
http://blogs.sun.com/SC/
and look particularly for entries related to the Geographic Edition.
I have nothing hosted with them, but it sounds like they were doing fine until the fire dept. forced them to shut down their generators. True, it's not as good as total redundancy, but again, it sounds like they could have coped. Probably the generators weren't even a slight part of the problem--just playing it safe.
The systems commonly used are of the HI-FOG mist fire suppression type. The pipe work and nozzles are often mistaken for 'sprinklers' but in fact discharge a very fine mist that puts out the fire and is safe for humans and the hardware. Very common for DC and Telecoms applications.
Gas discharge systems are expensive and the older CO2 fueled systems can be lethal to humans in areas where there's no ventilation.
Paris....'cos she enjoys a good sprinkling every now and again.
guys this is nothing to do with poor disaster recovery. the transformers taking the power from the grid have blown up. this damaged the lines in the building and the floors going to the racks. no matter how good your disaster plan is...it wont allow for this scale of event. they have to replace power cables, etc and the servers are offline until it is save to turn them back on. this is a serious failure of power...not simply generators not working or a fibre cable being cut
"nothing to do with poor disaster recovery"
Yes it is. Good DR requires that you have a backup installation sufficiently far from the primary site to withstand events like 9/11, New Orleans, Chernobyl etc.
If a few exploding transformers that take out some racks and cables put you off air, you do not have a valid DR plan. A UPS and generators might provide some measure of local high availability (HA), they don't cut it for DR.
And yes, DR costs more than HA. Just like insurance, you have to pay for adequate protection, or pay the price. There are a number of reports around which show that ~40% of businesses without a DR plan go bust after a disaster. The rest have a very painful few years.
The nameservers for a particular domain really should seperated geographically and logically (network-wise). Getting a secondary nameserver is free or dirt cheap.
I sometimes hear people say "it doesn't matter much anymore". This is rubbish. Having all of your nameservers down is much worse than just having a service like your website offline. With all of the nameservers down mail to the domain won't queue, it will bounce and people visiting the website will see a message akin to "This domain doesn't exist". Non-technical users might be excused for thinking a company had gone out of business.
Run multiple namservers in different parts of the world. It's cheap, easy and saves a lot of hassles.
It's perfectly obvious that this fire story is all a cover-up: what's REALLY going on is that the governments of the English-speaking world, having awoken to the very real dangers of the impending recession, have struck pre-emptively and taken steps to increase office productivity by shutting down all known havens of timewasting. Mark my words, icanhascheezburger is next
"Good DR requires that you have a backup installation sufficiently far from the primary site to withstand events like 9/11, New Orleans, Chernobyl etc."
DR does not mean uninterrupted operation, it means a plan to get back in business within an acceptable amount of time. You have to be realistic and match your DR plans to the level of service you are offering otherwise you will be out of the highly competitive lower/mid end hosting biz very quickly.
This is a host with 50K servers, they lost 9K to this event. I believe they have 6 data centers, AFAIK they are all in the Dallas area of Texas taking advantage of the low power costs there. Following your logic they should have one or more data centers sitting idle in another state just in case of a catastrophic event such as this one. There's no way they could do that unless they were selling a much higher grade of service.
They are recovering from their disaster. Last time I checked something like 2/3 of the servers are back up, or in the process of getting back up (and that's in less than 48 hours) and there's a plan in process to temporarily get power to the rest that were directly affected by the explosion.
I have a website that's hosted at one of their other locations, I am critical of their design that put management servers for my location in the center that suffered the fire, causing unnecessary disruption of service that would not have happened if the centers were independent.
It came back at 8.32pm BST. Apparently there are not too many more to go now before all the servers are back. Looks like B3ta will be left until last. LOL.
The second floor is running on mains power again, but due to damage to the underfloor power conduits the first floor is all on generator power and will be for the next 10-12 days. Ouch! Hope they've bought plenty of diesel and a mechanic.
I agree that appropriate DR needs to be matched to the service they are selling. If their customers are happy with an SLA that allows a 48+ hour outage then that's fine. The people I deal with will get upset (putting it mildly) over a 2-hour outage.
There's no need to have an idle data centre elsewhere, though. It could be doing useful work with some spare capacity ready to pick up the load from a site that fails, giving reduced service rather than a full outage. As with all HA/BCDR solutions its a tradeoff of cost versus RPO/RTO matched to the service agreement that you're selling to your customers. The likes of the Nasdaq or the NYSE will have very different DR requirements to a small company that will be only mildly inconvenienced by a two-day outage.
Personally I wouldn't trust my business to a company with all its data centres in one city, though. There are way too many possible common-mode failures there.