Ouch
When SDN goes bad hey? You gotta almost feel sorry for them.
European web hosting outfit OVH has reported its second major outage and Total Inability To Support Usual Performance* in a month and admitted the new outage was caused by its attempts to fix the cause of the last one. OVH's attributed its November outages to power problems and cable cuts. But this incident notice filed by …
Simon, you were going OK until almost the last paragraph, then you felt the urge to go with this gratuitous snipe:
"But there'll also be plenty who were impacted, and irritated, and wondering why they give their business to a company that's also experienced flood damage and can't configure routers well enough to avoid this sort of thing."
Maybe you would like to stroll over and show them how it's done?
I am not a customer of theirs, but when one of my suppliers drop the ball I do not shout at them: I offer to help them¹ getting back on their feet so that we both can carry on with our respective businesses. Everyone makes mistakes and we need to factor that into the equation, else we're not running our show properly.
¹ Usually a great way to do this is to leave them alone so they can spend their time actually fixing the problem, something to which answering questions from customers does not usually contribute.
Except in the sense that ignoring customers when you're company has f**ked up is a good way to ensure they stop wanting to be your customers.
And if enough of them do so with your business then pretty soon you're not going to be needed anymore.
Keeping customers (internal or external) informed should be part of any generic "S**t-hits-the-fan" DR plan. IT should not have to do it but there should be some kind of SITREP process that can be used to inform customers and someone on both ends of it who deals with it.
Let's be real. S**t will hit the fan. Anyone who's thinking "That never happens to us" is deluding themselves. It has simply never happened to you yet. So fail to plan or plan to fail.
> Except in the sense that ignoring customers
Where does the above say anything about ignoring customers? Of course those affected (customers or otherwise) by an incident need to be kept informed, and that is a standard part of any contingency plan.
However, customers (or anyone else) just jumping up and down and calling every five minutes thinking that is going to get things fixed any quicker is counterproductive.
Maybe you would like to stroll over and show them how it's done?
Evidently, quite a few people here could. You do not go around rolling out patches and upgrades like this on primary production systems. You have a staging environment, which is also your tertiary failover system. Once you're happy that staging is updated and apparently idling happily, you temporarily promote it to secondary and then do a failover test (which should be a routine, monthly event) by taking the primary offline. If things go TITSUP then the regular secondary system cuts in and you immediately bring up the primary back up again and investigate at leisure.
The point is, you always have three levels of redundancy and you always have two systems in known good (as in previously production tested) configuration. This isn't rocket science. It's a simple, sequential procedure. It costs money of course and it may not represent appropriate ROI for every business but in that case, say so and don't pretend to act all surprised when things crash and burn - it just looks like incompetence rather than the commercial risk/benefit/cost calculation that it (hopefully) actually is.
Are you seriously suggesting they build three backbone networks instead of one?
Your approach works very well with servers. It doesn't work for networks.
There's no difference. Tier IV is defined in terms of overall system resilience and ability to mitigate TITSUP conditions. It's irrelevant whether it is the network or the servers or the HVAC that goes down. All critical components must be fault tolerant and/or redundant. If that means you need to string a whole new fibre pipe across the Atlantic then that's what you have to do.
However, repositories of cat photos don't exactly justify Tier IV, so I would never expect most businesses to invest in that. What I do expect them to do (as I noted at the outset) is to say what service level they're aiming for and not act all dazed and confused when their Tier II (or I) infrastructure crumbles beneath them. You expect that with Tier II. That's what defines it as Tier II. That's why it's (comparatively) cheap.
> There's no difference.
While not disagreeing with the general idea behind your post, not all systems are equal and in some cases (not necessarily OVH's, I do not know) working on live systems is unavoidable.
I knew of a heart-lung machine repairman, whose job was to fix the thing when it broke in-theatre. Apparently the guy was an ace with a soldering iron.
While not disagreeing with the general idea behind your post, not all systems are equal and in some cases (not necessarily OVH's, I do not know) working on live systems is unavoidable.
I completely agree. I "beta test in production" regularly and, as expected, I regularly take down said systems because of bugs and mistakes. The difference is, those are non-critical systems and I send out messages several days beforehand saying that the system is scheduled for maintenance and should be expected to be offline both during the maintenance period and immediately afterwards because work might overrun (translation: we might cock it up or encounter an unforeseen problem).
I knew of a heart-lung machine repairman, whose job was to fix the thing when it broke in-theatre. Apparently the guy was an ace with a soldering iron.
High-pressure job. I bet he wasn't handing out 99.98% patient survival guarantees though. There's nothing inherently wrong in working with no safety net, it's just unprofessional to act all surprised when you eventually come crashing down and break something. If they advertised: "OVH is a Tier I service provider with DR provisions as limited as our fees." then I would have no problem with that.
Well, there is no backbone network built to your principles.
Even when you build everything important fully redundant, the way the routing protocols work mean that a single configuration error or software bug can bring down the entire thing. See Level3 disaster a number of years ago.
There is also no backbone network built with enough vendor diversity that a single bug (such as, say, configs magically disappearing) won't have widespread effects. When it comes to fancier features, interoperability is still so crappy that you need to stick with a single vendor to use them.
The only alternative would be having two identical (but with different vendors) but separate networks in a passive/active configuration. And for the obvious cost reasons, noone even considers doing something remotely like this on a backbone-wide level.
I've worked development on several large bespoke systems. Some had complete development and test environments, some did development testing on the live system.
The latter were substantially more stressful to work on.
So some people do.
But if you're at the design stage it's much better to set up a way to switch the whole system to a "test" company and make all (well IRL as many as possible) of your mistakes in that system.
Defiant - can you back that with facts?
OVH proactively monitor outbound mail from services and if your a spammer you won't be online for long - repeat and you'll be out.
I know it's great fun to be able to throw some "cool" comment on a topic you don't understand - but try not been that twat... I think you'll find life is much more fun
Hmmm,
As has already been said spammers get nuked very very quickly now a days the problem I have had a few times though is getting a IP that was previously used for spamming.....
But guess what 30 seconds of typing a support ticket and hey presto new IP block :O
I really dont get a few of the commentators in this thread, you either have no idea who OVH are, Or do know but dont realise what they have actually done and become....
I repeat my sign off from the last post
F*cking idiots
OVH won't accept the return of any IP which is blacklisted or has other reputational problems attached to it.
They force you to keep it until it's clean - and you pay a monthly fee for the IP's as you are no longer using them on a active machine.
So your theory is b*ll*cks. F*cking idiot.
In addition when you ask for a new block, they will see you've previously abused and you won't get.
Once you get caught at OVH you don't get second chances. They don't want your business - your out.
Also... it may take you 30 seconds to write the support ticket, but one thing OVH are not so good at is responding to email tickets... so it could be a couple of weeks before you get that new IP block via ticket.
So I believe it is you who have no idea who OVH are, what they have done and become.
Put down the keyboard and walk away...
Wow, big claims. Can you prove this?
OVH have some of the most comprehensive and open monitoring tools I've seen of any major provider.
Status site: http://status.ovh.net/ (detailed breakdown for all services)
Network Mon: http://smokeping.ovh.net/ (monitoring across hundreds/thousands of links)
I believe they are the third largest cloud provider in the world currently, and the largest European based. So based on how stats work, I'd kinda expect to see them regularly in firewall logs more than the competition.
Sure, been good value they will attract a certain type of customer (as well as major enterprise/genuine business users), but I think you'll find they are not very tolerant of users who do not work in there interests.
This post has been deleted by its author
Worth noting that as much as people are picking up on the script kiddy side of OVH, OVH actually has some seriously large clients, custom accounts / solutions and provision services which are well beyond the realms of kids aimed squarely at enterprise.
But posting the facts isn't cool or sexy, so yeah - we'll ignore that.