Ok
So how much does it cost to store a 5tb file for a year on amazon? And do they back it up or otherwise guarantee access to it?
Amazon has increased the maximum object size on its S3 online storage service to 5 terabytes. Previously, S3 users were forced to store large files in chunks no larger than about 5 gigabytes. "When a customer wanted to access a large file or share it with others, they would either have to use several URLs in Amazon S3 or …
according to the amazon calculator tool: $655.36/month for 5TB of storage ($452.20 for non-redundant) but you'll also need to add on $512 for 5TB of inbound data transfer to upload the file (assuming you manage it first try) and $767.85 for 5TB of outbound data for a single download
so assuming you upload 5TB, store it for a year, then download it once that works out to $9144.17 for the year
for the sake of comparison, OVH (a budget provider) have dedicated servers with 4x1.5TB drives (6TB non-redundant storage) for £64.99/month, which includes 5TB/month of bandwidth at 100Mbit (after 5TB they drop the port speed to 10Mbit rather than sending a bill, although you can pay for higher port speeds/more bandwidth if needed, still far cheaper than amazons prices)
you can try and argue professional storage vs budget dedicated server, however the difference in price is so vast that you can get multiple dedicated servers from multiple providers to mirror the data and still save money, and have several servers laying around that can be used to do something useful with the data (whereas amazon would send you yet another bill if you want to actually use the data for anything)
of course if you simply want the data stored and don't need it to be accessible via the internet then it's even cheaper for a pile of hard drives/tapes to stick in several separate locations
You make a good point, but to be fair I think there are some other factors that should be included:
1. If you use multiple budget servers, you need to find a way to reliably propagate updates between them. That's not difficult if you always upload to one server and replicate to the others. However Amazon achieves high availability by spreading uploads over a number of servers automatically. It doesn't matter if half their servers fail; you can still upload and access your files. Amazon have a proven, reliable setup that is ready to use.
2. You miss out any sort of administration costs. Amazon's service includes the cost of 24x7 monitoring and support of the application. If you put together 3 or 4 budget servers you need to have someone around to support them. Most ISPs just support the hardware, and even then won't monitor it or take action to fix it without you contacting them first.
3. Amazon S3 is fast to access from Amazon's other services, such as EC2. If you are using EC2 a lot then, the S3 cost is relatively small.
It depends what you want to do, and who you are. As you point out, it's possible to put something together, but Amazon does have a place in the market. As their charges are proportional, they are very cheap for small amounts of data compared to building your own.
I think your figures are a little askew and your alternative solution a little under specified to be a usable comparable alternative.
For a start the whole point of S3 is durability and scalability. S3 duplicates your data on multiple hardware within a facility and across multiple facilities within a zone. So your initial OVH 4x1.5TB HDD server needs to run in RAID array for a start that takes your storage capacity down, so you need to double up on your initial server straight away to get back to your 6TB's. You then need to duplicate this set-up across 3 different OVH data centres. Your price has just gone up to £4679.28 or $7390.07
So what do you get with S3 for the extra $1754.10 per year that your alternative doesn't give you?
Well quite a lot actually.
1. You don't have to manage / monitor any hardware. That in itself is worth the extra dollar value.
2. You don't have to set-up a method of replicating the data across the different facilities and that's in real time I might add. With S3 each file / object is duplicated as you upload it. Checked for data corruption and if necessary repopulated automatically with a copy of the uncorrupted data, before returning success on upload.
3. You don't have to come up with a method to re-populate data should one of the servers fail.
4. You have one single point of access. Rather than having to switch between 3 different servers.
5. You are not limited by a 10Mbit cap when you have uploaded your 5TB of data. In fact the limiting factor is your data connection with S3.
6. You don't have to buy and add new servers to all three facilities when your data storage exceeds your physical 6TB limit. S3 storage is unlimited. The whole point of cloud storage being it scales.
And you've missed the biggest point of all, the reason for the article in fact.
How are you going to store a 5TB file on your 4 x 1.5TB HDD's? You'd have to break it into chunks which is exactly what the announcement in the article aims to avoid.
The Heath Robinson solution you provided is in no way comparable to what S3 and AWS are offering. You've got a solution to a different problem my friend.
This post has been deleted by its author
You're talking about RAID 0 data splitting. We've already established that those disks are going to have be mirrored. It wont work you don't have the capacity. You'd have to split it up over servers.
You can use RAID 0 across all 4 disks if you want. But you're then going to have to mirror that data on another server and you've introduced another problem. You've doubled your data transfer in copying that data off server and were already down to a 10Mbit connection. Better settle down with a copy of War and Peace while you're waiting. That alternative solution is getting more and more painful. Easier to use S3 I think.
Redo your math and take into consideration simulations replication between data centers with no extra thought or effort. Then do your math again to see what it would take to replicate your budget stuff across facilities that span multiple continents.
It's easy to peg AWS as "expensive" if you are not honest about the true cost of (multiple) facilities, power, cooling, administrative burden, hardware failure/replacement, staff salary, the diesel in your generator tanks, etc. etc.
The people who claim AWS is super expensive seem to live in a fairyland where staff salaries are zero, operational burden non-existent and the cost of electricity, facilities, bandwidth and cooling is zero.
Cloud stuff is a numbers game, it's very easy to game things to fit whatever scenario you want to 'win'. I get into these depressing numbers games all the time unfortunately.
AWS is not a win for many situations but to have an honest answer you need to start with honest assumptions about the true cost of delivering these services locally.
With the standard S3 service any upload 'put' action into Amazon S3 is not acknowledged until the data has been written to multiple stores in at least two AWS datacenters so you get reliability and multi-site replication automatically from the moment you begin using the service.
Amazon will NOT give you a 100% SLA or the sort of guaranty that would make your lawyer happy but there is a tremendous amount of reliability and availability engineering built into S3. Far safer than most local storage setups for instance.
Price is public and available at the aws.amazon.com site
The rate at which Amazon rolls out new services and enhances existing services is the #1 reason why infrastructure-as-a-service competitors and the open source clones will fail and fail badly. Just point your RSS reader at http://aws.typepad.com and peruse through the last 6 months of articles -- there are few other companies right now that could match the rate at which AWS rolls out cool new features or entirely new product lines. It's messed up but right now unless someone catches up within the next 10 months or so, Amazon is going to rule the infrastructure cloud world.
I'm one of the people interested in "big data" cloud storage and the 5TB object size limit announcement following so quickly after the multi-part parallelized HTTP upload enhancement is simply mind blowing. It's a game changer for more than a few technical disciplines and industries.
Your headline hurt my head:
5*2^40 * 2^20 bytes = 5 242 880 tebibytes
where tebibytes once equalled terabytes before HD manufacturers (and then clueless users, and then OS makers) ruined everything.
I was disappointed to read that we can only store 5TiB files on Amazon. Still, I guess that's better than the old limit.
I'm with Comcast. Monthly bandwidth limit - 250GB. Just one Carbonite backup or restore and I'm done for the month - no more internet via cable. Good thing I still have dialup... oh, wait a minute... my "land line" phone is with Comcast, too. Maybe I can use Verizon at $1.99 per minute... OMG!