The bigger they are...
...the further they fall.
Surprised at this size and market share, they had to bring in a company to mitigate against a DDOS attack...
Amazon has still not provided any useful information or insights into the DDoS attack that took down swathes of websites last week, so let’s turn to others that were watching. One such company is digital monitoring firm Catchpoint, which sent us its analysis of the attack in which it makes two broad conclusions: that Amazon …
Not so fast. If we assume that a typical white fluffy cumulus cloud weighs about 500 tons (it does) or ~120 KiloJubs (~1 LINQ Hotel Recycling unit), it is pretty massive, but the mass is in a volume of about a cubic mile (normal size cloud). A bigger cloud could be ~2 cubic miles (or 1000 tons).
I don't think that the existing Reg Standard Units give a suitably intuitive measure of large volumes with relatively low densities - Could I suggest the "FluffyCloud" for larger nebulous/amorphous masses i.e. I FluffyCloud = 500 tons/cubic mile - The standard volume of an Olympic-sized swimming pool is ~1.4 million to a cubic mile so the FluffyCloud is roughly 0.088 Jubs/Olympic-sized swimming pool (m/v)?
The S3 operations team monitors a zillion metrics but as the article noted, it's almost entirely "inside" activity. No doubt customers started complaining about bucket availability and there might have been metrics that showed a downward move in request rate that didn't jive with historical levels.
The answer will be to either piggy-back off of specialist anti-DDOS providers but also to stand up arms-length availability metrics viewed from 'outside' as it relates to geographically distributed name resolution. Or more likely write their own DDOS implementation and embed it into the Route53 infrastructure. S3 front-end itself has request rate-limiting already. But I couldn't say how hardened it might be against a flood of malicious payloads.
"These are not the DNS servers you're looking for."
In other news, a temporary cease-fire took effect in the latest middle-east conflict as US forces were unable to access their fire control systems due to a back-hoe operator's error in West Virginia. The enemy used the outage as an opportunity to re-occupy areas captured in recent days by U.S. forces and bury their dead.
UPDATED: The government has denied reports the back-hoe operator was recruited as an agent by the enemy however investigations into recent suspicious payments made into the operator's bank account are on-going.
This post has been deleted by a moderator
I am old enough to remember "Nobody ever got fired for buying IBM". The same sort of principle applies here. Depending on how good your own IT department is, an in-house system may or may not, be more reliable than a cloud based one. However a crucial difference is that if something takes out your in-house system, it will be all your responsibility; if something takes out Amazon it will be quite a lot easier to persuade people that "the internet is broken". Further, if your in-house system has to interface with another company, it is quite likely that if Amazon goes does, it will take down that other company ... and take down your utterly reliable in-house system too.
This doesn't apply so much if your application is safety related where people are probably watching absolute metrics of reliability
I would say it is the other way around. If it is your system, you are responsible, whether you cock it up or whether your hoster cocks it up. I'd rather be in charge of my own destiny.
If I mess it up, I can run around and try and get it working again and my director will be kicking my arse. If it is a cloud service that goes down, I can sit there and twiddle my thumbs, while my director kicks my arse...
The same for a compromise, I'd rather have that in my hands, as I am legally responsible for any data breach. If I'm going to lose my job or end up in court, I'd rather it be for my own stupidity, rather than someone else's.
You can also look at this from a user's point of view. If there's a problem which takes out one service, most users can carry on doing "things" that aren't affected. Take out everything on a particular AWS node (are they called that?) or Azure thingy (look, in this respect I'm a user, not an admin :-) and dozens - maybe hundreds - of businesses are out of action, and possibly millions of people are sat - as you say - twiddling their thumbs.
As a very simple example, if my company uses local installs of MS Office and something goes wrong with my computer, it doesn't really affect anyone else and - all other things being equal - I could just move to a free computer (if I can find one) and carry on there. If my company is using Office 365 and something borks that, it's not just me, it's a couple of hundred colleagues, and countless thousands of others in businesses up and down the country.
But there's a whole teetering tower of dependencies in most of these things. What if the central file server goes down? I may be able to write a new Word document, but I'll not be able to edit any existing ones. What about DHCP or the Domain? Without those I can't even log on to the computer to do local work. What about external connectivity? Internal communications may work, but I can't send any emails or perhaps even make phone calls...
It depends on what you mean by impacted.
We were DoSed by Google a few years back. I'm assuming one of their servers was badly configured and it was pouring 100mbps down our 10mbps line.
We couldn't use the Internet for a short time, no email, no web browsing. It didn't disrupt the business at all. All the data the employees needed was on premises, as were they.
After an hour of trying to contact Google (email auto-reply from Google "abuse" and "webmaster" saying that the email had been deleted without being read. Telephone queue said to go to the relevant page on Google.com (there is no page to deal with being attacked by Google), after 10 minutes of back and forth through the queue, the line is disconnected. Twitter - no reaction from the official Google accounts), I contacted our ISP and got them to put a perimeter DDoS prevention in place and were back up and running within 1.5 hours. We then had time to switch over to our backup connection and changed the DNS entries for our mail and VPN servers.
We were lucky, because the company had a policy of no cloud services, everything had to be on-site - many of the customers insisted that the confidential information about their production facilities was never stored or transferred over the Internet. And the company wasn't one that needed the Internet to do business.
Google were still bombarding the connection a month later, when I re-checked.
No. They were pumping 100mbps at our gateway.
If it had been cloud services, the services would have been unaffected, but we wouldn't have been able to reach them and we wouldn't have been able to work until the problem was resolved.
Having the users and the services on the same bit of cable is very helpful, when combined with off-site backups.
For a large company, spread around the country or the world, that is another matter, but if all your users are in one place, there is no real gain to not having the data where all your users are.
Depends on how those 500 business are deployed.
If they have something in common that can be DDoSed - i.e. an ISP infrastructure, or a CDN one - it could look similar. If they are more distributed, it will make the DDoSers requiring more "power" and coordination to attack them at once. Still, smaller business may be built on smaller and less robust infrastructures which may be easier to struck down even with less powerful attacks.
The problem may not be even the "cloud" itself, it's the concentration into a couple of providers only, which become a huge single point of failure. and any issue can then quickly propagate to thousands of businesses - even those that could only be "collateral damage" because the attack was mainly directed at others.
Then depends on what kind of disruption you fear more. A DDoS may take your system "off the internet", but on-premises systems can still work inside the company and let people keep on working. Of course the damage depends on how much the outside access is important. Systems moved to the cloud could impact even internal operations
I used to help with a lot of network development and testing, helping quite a few companies to shape their switches and monitors, including some RMON devices.
They used to send me their latest hardware and software to install and beta test.
I moved on from that role to become an international corporate's IT Manager, but installed "non corporate" firewalls and monitors so that I knew what was happening on the network.
Many of the companies I "worked" with have been hoovered up by the bigger players or fallen by the wayside, but there is no excuse really.
If you are responsible for the well being of a major network which has impact upon business viability then you have to install proper monitoring mechanisms to let you know very quickly if a problem develops.
Even then I got fed up with attacks so used reverse DNS and who is to go after the US based ISP.
After the initial "nowt to do wi' us", the support chap realised that he was actually being attacked as well, but it had taken over one of his servers. A quick flurry of activity and problem solved. I later got an email to say that they had many servers susceptible which were in line for take over and they had now decided that instead of sitting back on server patches they were going to instigate a better maintenance schedule to minimise the chances of it happening again.
I did not really appreciate the effect of my monitoring on our companies, until the corporate Finance Controller left to freelance elsewhere. On a keep in touch visit, I was told that they had not realised that big companies still had major IT issues as there had not been any in 4 years with us, but were a constant occurrence in the other companies they had been into.
Although this attack looks like a transient problem, I think it's pointing at something bigger.
The whole purpose of cloud computing is that people will outsource some things they are not very interested in doing to people who are interested in doing them, and in particular can do them for less money than the people doing the outsourcing can do. The way they achieve that is scale: the cloud people run very large environments and do it in such a way that the effort they need to run them goes up more slowly than the effort involved in running lots of little environments. This means that there are many fewer large environments than there would otherwise be, costs go down and everyone is happy.
So far so good. But there's another aspect to this. Any serious failure in the relatively small number of very large environments now has a much larger effect than a corresponding failure in one of the previous large number of small environments. In particular there can be nasty correlated risks, of the sort that had famously bad consequences in 2007-2008: multiple organisations which seem to be independent and whose chances of failure are treated as if they were are, in fact, not independent at all, and all fail at once.
Well, this would not be a problem if the small number of very large environments were all partitioned in such a way that failures of one part can't cause other parts to fail. But the whole reason for these things existing is scale, and scale means that you use techniques which can control these very large environments with a very small number of people. Scale means that, if you want to make some change, you push it out across the whole huge environment, or significant chunks of it, in one go, and in particular that you don't go around each of tens of thousands of the small chunks of the environment corresponding to each customer doing it on each one.
And that's great, so long as you are very, very sure that the small number of central points of control for these huge environments are extremely safe: that it is not possible to push out some bad change by mistake, or that it is not possible for some bad actor to gain control of one of these central points and do so intentionally. And, of course, bad actors will be very, very interested in gaining access to these central points of control. And some of these bad actors are nation states, with the resources of nation states: they can run thousands of people, including people who might, for instance, get jobs in one of these places.
And none of these platforms have existed for very long: AWS started in 2002, 17 years ago, and they've spent most of the time since dealing with huge rates of growth with all the problems that brings. How good are their security practices? Well, we know that in 2013 the NSA, who are meant to be good at the whole security thing, leaked a vast amount of information because of completely deficient security practices: let's hope that AWS are better than the NSA were in 2013.
In practice this is all a time-bomb waiting to go off. The cloud people may be very good at security indeed, but their whole business model is based on scale and thus on central control of huge environments, and the terrible state of computing security in general combined with the huge prizes to be won by controlling such environments means that, in due course, there will be a compromise. And at that point everyone who shares the infrastructure behind the compromised environment is going to be in trouble. Let's hope that the people who try to work out when risks are correlated have actually done their job when that happens (hint: they haven't).
Biting the hand that feeds IT © 1998–2021