But everything's OK.
They had it all backed up to the cloud.
A power outage fried hardware within one of Amazon Web Services' data centers during America's Labor Day weekend, causing some customer data to be lost. When the power went out, and backup generators subsequently failed, some virtual server instances evaporated – and some cloud-hosted volumes were destroyed and had to be …
No. Customers who lost data on their EBS volumes did NOT have it backed up to anywhere – including the cloud. EBS is nothing more than very resilient block storage. Nowhere in its service description or SLA does it imply that you do not need to back it up. In fact, in the service description it blatantly states you can expect to lose 1 or 2 volumes a year if you have 1000 volumes. ... so you should use EBS snapshots and back it up.
Customers that lost data had no backup of their EBS data.
Err, no. Customers who lost data most likely have backups just not real-time ones. If you backed up at midnight and had a failure at mid-day, for example, there will be a portion of your data that is not backed up hence you suffer data loss. It wasn't because you don't have backups though, just the timeliness of them.
I've heard so many times that you can migrate your on-premise servers to the cloud in a more or less 1:1 mapping and let your cloud provider do all the work of maintaining uptime and data integrity.
And yet again we have proof that you still have to put in the effort to ensure you have geographically diversified replication and backups.
Yes, you can migrate your on-premises systems to EC2 in a 1:1 mapping -- as long as one of those servers is the backup server. ;) . (Or, of course, use the cloud equivalent of such.) . These customers migrated but left backup out of the equation.
"Made to believe." Not by the product descriptioin or the SLA, that's for sure. The product description comes right out and says you will lose 1 or 2 volumes a year if you have 1000 volumes -- so backup. The SLA does not imply anything about backup.
So if they were "made to believe" that, they didn't get it from Amazon.
Gonna get some downvotes for this, but if you use EBS, that sort of failure is to be expected. Amazon say if you have 1000 EBS volumes running for a year, you should expect one or two to fail and have to be restored from backup. Those numbers are obviously averages across all AWS DCs, whilst problems tend to be concentrated at particular DCs.
If you put data in there that you absolutely must have restored, you should either use a different storage or take snapshots as regularly as you need them to be. EBS is the equivalent of local disk storage, its not cross-AZ, if you require proper resilience in cloud you should be using something like S3, and design your systems appropriately to be able to use that sort of storage.
No Tom, just stop it!
This thread is for ill informed rants about "the cloud is just someone elses computer". Someone who uses the cloud, understands exactly what this storage is supposed to provide and then points it out is gonna get short shrift.
Given that Cloud is sold to manglements on the basis that it takes away all those complications of dealing with their in-house expert staff and hands it over to people who'll just do the work without arguing those rants seem fully justified.
It is somebody else's computer. When using your own computers you expect someone on your staff to look after them. If you've been persuaded to use somebody else's because it's cheaper you might reasonably expect that somebody else to do the looking after. Anything else smacks of keeping a dog and barking yourself.
"Manglements" should at least know how to read an SLA. That's what they are good at. Someone somewhere should mention that this doesn't include backup, and if they're doing their job then they would double-check that. And if they double-checked it, they would find out they are responsible for backup.
EBS is a very secret system that no one is allowed to understand. The only details AWS will provide about it are its general service and uptime/redundancy, but you aren't allowed to know how it works or how its redundant.
I can't understand why anyone in technology wouldn't want to understand how it works, but all these developers and PHBs seem to be fine with not knowing.
I had a days long discussion with a developer at one point explaining to them that I have never had a storage failure on a raid system in the more than 20 years I have been doing this at many different orgs, some of them fortune 500. When it finally dawned on him that its not normal for businesses to incur data loss due to disk failure, he was shocked (because he was a developer and just thought about things realted to what he knew, like his desktop computer).
No offence to any devs out there...
This post has been deleted by a moderator
EBS is very resilient storage, but it's not fool-proof.
And, no, they were not lucky the snapshot survived. An EBS "snapshot" is actually an image storage as an object on S3, which is an entirely different system. Unlike EBS, S3 is replicated across three AZs, and the replication happens at an object level. EBS is only replicated within an AZ, and it is using block-level replication.
As I read the article they had multiple failures, and it buggered multiple systems simultaneously. 99.5% were able to recovery, and most of the .5% had EBS snapshots so they could recover as well. But some of those did not have snaphots, so they actually lost data.
...that's cause you get managers who say "I want to be infrastructure free. It will save us loads of money" and despite being told it won't and there is no such thing as infrastructure free, they want stuff done on the cheap. So think moving to the Cloud means their service will do everything for you backup wise, not understanding you have to set that up yourself and pay for it.
Biting the hand that feeds IT © 1998–2022