Your opening paragraph says $5/GB/month?
Seagate's EVault cloud backup sub has launched a Glacier cloud archive competitor, priced at $15 per TB per month with instant data access and data preserved intact for decades. It will use Seagate Kinetic drives in the future. Amazon started the cloud archive ball rolling big time with its Glacier service costing $0.01/GB/ …
Thursday 19th December 2013 02:39 GMT Anonymous Coward
> data retrieval access taking three - five hours
This is due to the Glacier racks not having enough power provisioned to them to spin up more than about 1/5 of the drives in any given rack (about 900 I think) lest they blow the circuit breaker. It has *NOTHING* to do with the type of drives used (bog standard 5400-7200RPM SATA/SAS disks). Since you have to queue the jobs up, figure out which drives to spin up in which racks in which datacenters and not blow the power budget (not sure if using same erasure-coding N:K format as rest of S3) it takes "a lot" of time to reassemble the stream and spool it (in)directly to S3 for customers to fetch.
I don't think the 3+hr wait is currently because of demand, but rather a function of trying to "merge" requests to see if the scheduler can re-use the most assets. They have to run periodic scrubbing and parity rebuild too. Drives can sustain many load/unload events (~300,000) with an expected replacement period of say 3 years calculates out to 270 events / day or 11 / hour. I expect Amazon's goal is much lower than that, probably aiming for no more than ONE per hour.
When Shingle drives arrive at Glacier and Seagate, then Seagate will not be able to keep the stated SLA unless they've gone to extraordinary lengths to isolate vibration (from fans, air handlers, and of course drives themselves) since otherwise they can't get more than 2 or 3 per 80+ disk chassis spun up at the same time. Maybe they can bump that figure by running platters at half-speed, or maybe vibration only really matters when doing WRITES - so spin down all but a couple drives when it's time for the enclosure to take a WRITE workload.
Thursday 19th December 2013 21:12 GMT Rob Isrob
Tape or Disk for Grlacier?
"This is due to the Glacier racks not having enough power provisioned to them to spin up more than about 1/5 of the drives in any given rack"
You sure about that? Reference please!
"Contrary to previous ideas that Amazon's Glacier cloud archival storage uses disk-based object storage, the latest word is that it is based on LTO6 tape. This comes from a nameless but senior person in the IT industry who "cannot talk about it". El Reg has also heard the same from another source in the general IT industry.
This idea of tape being the Glacier store fits in with the longish retrieval time for Amazon Glacier data and the cheap-as-chips cost structure Glacier has."
the latest word from a reliable source reveals that it uses LTO6. This was leaked out by a reliable senior source of Amazon, at the recently held IP Expo.
SpectraLogic is going from strength to strength. It has just been confirmed by a third person "familiar with the situation" that Amazon's Glacier archive service uses SpectraLogic tape libraries, thus enabling its low cost.
Which makes sense. Architecturally, I think they made the right choice with other LTO Gens coming online. I suspect Amazon will punish soon by reducing Glacier's already cheap costs. Using tape as a back-end , they have a lot more headroom to reduce as tape is quite a bit cheaper versus hard drives.
But the problem in general - long term archiving - demands the lowest cost. Retrieval time? Meh. We'll see how EVault does if they stay ~50% more costly than Amazon.
Tuesday 24th December 2013 16:29 GMT Anonymous Coward
seriously? An anonymous quote from over a year ago back when every dog was speculating as to what it was and it was advantageous to Amazon to throw everyone wildly off the scent? Amazon Corporate may indeed be running tape silos. I know what 'big' tape jukeboxes look like at Dept of Commerce scale and if you wanted to do AWS tape archive you'd need a mind-boggling amount of floor space. Glacier is no less reliable than S3 which means it has to use erasure coding. Doing that with tape drives in multiple locations gets a touch messy. Not that they couldn't slice the data and queue it up on spool servers to dump in big batches.
As to the purported "high-level" source that "can't talk about it"... Ha Ha Ha
Friday 27th December 2013 20:05 GMT Rob Isrob
Re: erasure coding. Perhaps nothing is erased. TSM redux.
There were 3 quotes there, not a single quote. In a competitive industry, you often have to rely on somewhat sketchy sources - you would think. But anyhow, come over here for a follow-up comment:
Glacier is based, we understand, on data stored in tape libraries
Let's call it Snowfield for short and predict a cost of $0.0125/GB/month with Glacier potentially dropping to $0.0075/GB/month.