
Another failure, along with Office 363
Microsoft's cloudy platform, Windows Azure, is experiencing a major outage: at the time of writing, its service management system had been down for about seven hours worldwide. A customer described the problem to The Register as an "admin nightmare" and said they couldn't understand how such an important system could go down …
I remember that MS have/had their own way or working out which is the last Sunday in October. It's the 4th Sunday. It always is (except for those Octobers with 5 Sundays...). Cue the time being different between desktop wondows machines and back-end servers causing all sorts of headaches.
Yeah I read the comment. It said something about MS calculating Leap Years using a non-standard method that is wrong, hence their problem.
Then I made a comment that MS also decided that they wanted to make another calculation a different way to the standard, so that too is wrong sometimes (causing problems with British Summer Time).
I'm not an Apple fanboi so I don't see what's wrong with poking fun at MS inability to calculate thing correctly (so we also shouldn't forget their spreadsheet program which has also had problems calculating in the past!)
Wholeheartedly agree. I am absolutely NOT sold on this marketting gimmick called 'Cloud Computing' based on outages reported by major providers besides MS.
As with all IT marketting campaigns, this looks really good in the Powerpoint presentations and balance sheets but in practice is a business continuity disaster.
... the idea of being married to an MS engineer might not be so bad.
Granted, there's the appalling dress sense, the immature world view, the lack of stimulating conversation, the endless teeth-curling jargon and the permanent I'm-a-middle-manager-and-I'm-your-friend fixed shit-eating grin and patronising tone of voice which might obviously put you off.
But then again, there's lots of housekeeping money, and best of all, they work all day, all night and all weekend, so you get the house to yourself!
I'm not one for up-voting or down-voting, but you obviously missed the point. It's not that they didn't have a sense of humor. It's that you mixed acceptable "jokes" in there with unnacceptable ones. If I say a nerd has "an appalling dress sense", "teeth-curling jargon", or even "a lack of stimulating conversation", they're likely to laugh, say "yep, that about sums it up" and shrug it off. But you went too far with "immature world view" (which is just vague and meaningless enough of an insult to apply to anyone) and calling them middle-managers. No one likes to be called immature or middle-managers (especially when you already called them engineers, WTF?), thus the down-vote from two nerds. Your joke lacked consistency and the finesse to expect everyone to take it lightly and move on. And perhaps you should do just that with your two downvotes :).
I work with the boys from Redmond (seriously, I have yet to work with a woman from MS) on BPOS on a very regular basis and immediately recognized the poster's point. There is something different about most of the MS folks (always exceptions to the rule of course!) vs. the rest of the industry.
It's a bit like, thinking back to the University days, if Nerds had their own Fraternity - Microsoft would be it.
I've worked with a number of Microsoft developers and other technical types, and I've always found them to be pretty much like various non-Microsoft people I know in the industry. Generalizations about large groups of people are always suspect, and yours (even vague as it is) certainly doesn't match my experience.
Yes, this is somewhat of a troll / joke but still...
Every major cloud provider has had some extreme outages. Amazon has been down and out for a few days, That UK clouding provider (can't remember the name) has had a downtime of more than 2 weeks. Google has had issues and got down for one day.
And now Microsoft's cloud suffers from the same. As such my conclusion: they're finally becoming a real provider to content with ;-)
All clouds die ... all software crashes ... and hardware fails. It's just the way things are so quit bitching about it and go down the pub and have a beer, play scrabble, chat up the girl in the cubicle next to you, go out to lunch.
Chances are it will be back up again soon and then you can get back to work - enjoy the moment.
To say [Microsoft | Apple | Google | Amazon (choose your preference) ] is a rubbish platform and if users had only gone with [ insert your other provider of choice ] their world would be rosy.
Troll / bitch, platform fanbois rant.
There saved all you folk who were going to have a rant at MS.
(p.s. i don't like MS as much as the next guy but the 'my dad can beat up your dad' argument gets tiresome.)
If you care to read most of the comments, you'll not see bitching at a particular cloud/outsourced/managed services/whatever this year's marketing buzzword is/etc provider, but at the idea that this cloudy thingy is some sort of panacea to all your IT availability issues, and you can throw away any other business continuity solution.
It isn't. Factor in proper BC, Due Diligence over the providers' offerings, and service failure insurance (ha!!!) and suddenly all those putative cost savings go Poof!!!
Unless, of course, your name is Matt Asay......
> Vista in 2012? Are your IT Support guys snails?!
I've recently been contracting for a company that is rolling out a managed image to its desktops. It's been a slow start, but they're now getting there.
The image is Vista. I met precisely zero users who were happy with it - some saw XP as being more performant, others saw Win as being a more useable choice. But no, Vista was the thing on the cards.
I found an old box & put Fedora on it. I had quite a few users by the time I left :-)
Vic.
I retired my last Vista machine last summer, but only because it was a laptop, the keyboard wasn't working properly (hardware failure), and it was no longer under warranty. If it hadn't suffered the keyboard failure I'd likely still be running Vista on it. I've read hundreds of these complaints about Vista, but frankly I never found it that much worse than XP, Server 2008, or Win7. I find them all frequently annoying, when they're not working well enough to be invisible.
Of course, my Vista installations were highly customized (security policy, group policies, etc), and I do most of my work in Cygwin bash sessions with vim and command-line tools, so perhaps I'm not the typical user.
But certainly if I were still running Vista I wouldn't want to waste even a day upgrading to Win7. I didn't when I was running them side-by-side.
This post is provided as a community service, trying to save the Reg storage and bandwidth for worthwile discussions instead of the usual server hugging cloud hate commentard reactions.
"Never trust your data to a cloud, I need to have absolute control over my data"... yeah, as if you were using 100% in sourced data center, tape storage management, disaster recovery, or help desk services. Remember, the vast majority of leaks and security threats come from people inside your organization.
"Their SLAs are a joke"... yeah, as if your uptime was 99.999% Oh well, this maybe is true of a few banks and financial institutions and perhaps a secret agency or two. The rest of us know that to have even the same uptime as the Amazons, Googles and Microsofts needs a disproportionate amount of investment.
"They are not responsive when things goes wrong"... really? Have you ever asked your users how they feel about your responsiveness in the event of an incident? Do you have a dashboard providing real time status updates each time one of your self managed and self hosted apps or services is down? How has been your incident management and recovery time measured lately?
It's all about the feeling of losing control, I know. But face it, sooner or later it will happen. In a few years time, the idea of someone providing basic services from its own hosted facilities will sound simply crazy and wasteful.
If the service has been down for 7 hours already, they'll struggle to achieve 99.9% availability for the year. Actually, before I escaped from corporate IT, I was achieving 99.997% availability across the UK - and that was in 1996 with Win3.1/Netware4.1/mainframe (IIRC the .003% was BT losing a Kilostream link to Preston for an afternoon).
Guaranteeing 99.999% can get expensive (and adds complexity) and is probably more than most businesses require. But it's not rocket science.
> 99.997% availability is like 15 minutes down in an entire year?
Yes. Well calculated.
Customers frequently demand "five nines" availability (99.999%) until they see the cost. But 15 minutes a year will usually give you one reboot even if you haven't got spare hardware (which you should have).
I have a policy to reboot at least every 1000 days. Just because...
Vic.
To be pedantic (and why not?), our service was based on 10 hrs a day and 6 days a week, so 99.997% is just 5 minutes. Preston was down for 4-5 hrs, but it only represented ~1% of the work force, hence the 99.998% (can't remember where the other 0.001 went). We didn't count (though we measured) individual workstation failures, since we effectively had roaming profiles and no data on the local drives, so if your PC died, you just used a spare.
True story: PCs would occasionally behave erratically, which a reboot would cure. It turned out that there was a bug in the Netware client for Windows which was failing to release a user handle after each logoff - after 25 or so logon cycles there were no handles left. Most of our users logged on once a day, so Windows 3.1 had to run for a month without a reboot to show the problem. Try telling that to t'youth of today ... and they won't believe you.
Thanks to your pedantic comment, we now know that it all comes down to your definition of "uptime". If my numbers are not wrong, according to your definition (10 hrs a day, 6 days a week, 52 weeks= 3120 hours), to meet the 99.999% uptime out of those hours your entire user community could be no more than 11 minutes without service during the whole year!!!!
And you had a location with 99,84% availability, but that was diluted in the overall user count. My feeling is that the (few) people at Preston laughed at your claims of 99.999%
Well played on your part, and your butt was probably saved because all these Windows reboots and dead PCs did not count on your "uptime" measurement. But surely did counted on the business value of that service.
The point is still valid, while not rocket science, true, actual, business definition of five nines is extremely expensive. Just to make it clear, to make those five nines actually valuable from a business perspective would mean in your case that no Windows 3.1 PC could be dead or need more than a couple of times per year.
And let's be honest, most business types can go beyond wishful thinking when facing the true costs and accept that they can do with four or even three nines without too much disruption. It's all about the cost and the benefit.
Yes, SLAs are agreed with the business and inevitably include some degree of averaging over space and time (our agreed level was 99.99% measured annually, which we comfortably exceeded). If the business had required 99.99% availability at each individual location, we 'd have arranged for multiply routed WAN links (ideally from separate suppliers, but that was difficult to achieve nationwide back then). When we'd shown the bosses the costing figures, I'm sure they would have settled for a lower guarantee.
PCs dying was rare, but still more common that reboots. I'm not sure it's possible to guarantee 99.999% availability to an individual workstation - you'd not only need UPS but also lots of dual components. Does anyone make a desktop with dual power supplies? (If they do, I bet it's pretty expensive.) In reality, all you had to do was walk across the office and use the PC of a colleague who was away. A replacement would arrive with minutes at a big office (where spare systems were held), but at Preston you might have to wait a day for repair or a replacement to be shipped.
Typically we apply patches and upgrades to a test server before company wide deployment. I cannot see why they cannot do the same thing with datacenters. If they bugger up, pull it off the grid and let the rest tick along.
> Vista in 2012? Are your IT Support guys snails?!
We still run WinXP on most of our workstations, even the shiny new ones. Many of our mission critical software vendors do not support the later versions of Windows clients yet. The virtual WinXP mode on Win7 does not work with one of our major database packages so no go on upgrades for now. Not everything is within the IT Folks control. Well, not counting the BOFH, of course.
there's a lot of power buttons to turn on and off in the cloud
rough calculation:
-5 seconds to move to the next computer and press the reset button
-100,000 computers in the cloud (guess)
= 83 hours!
So I imagine they have a team of 10 or so people that explains why it took 8 hours to fix
Not so fast: it's 'only' the management interface that's down so your app. should still work fine.
BTW: management is up now - 14:40 as I write this - provided you don't want to manage any Database, Datasync, Reporting or Service Bus, Access Control and Caching settings.
So, just the Hosted Service, Storage Account & CDN or Virtual Network configs available then.
What's the betting this resolves itself as the data centres move out of the Leap year 'danger zone'?
This post has been deleted by its author
No, actually, I LOVE to say "I told you so" - clouds blow on the wind. How can anybody run a business on that basis. NOT ME for one.
It's the same with outsourcing to some country where they cannot even speak to your users without communication problems. Been there, experienced that, thank you very much.
FAIL!!!
FAIL!!!
FAIL!!!
FAIL!!!
FAIL!!!
FAIL!!!
You know I get criticised for overbuilding my network and hosting topology through the software layers and this is the reason why I spend so much, so we can deliver the highest SLA's on the market. At Savtira we have 10 data centers which are distributed throughout the USA/EMEA and they are load balanced at the DNS level and go as deep as checking to ensure the SSL is responding correctly before sending consumers to a replicated server. Its self-healing, easy to maintain since you can just take a entry out of DNS to do maintenance on a entire data center and in the event of a problem it has auto-fail-over to the next closest data center... we just sent out our SLA's (I created first ever network/hosting SLA when I founded Savvis)
http://www.savtira.com/press_release.php?n=84