Wednesday, August 22, 2012

Is Amazon Glacier Really As Cheap As It Seems? The Math Might Be Surprising

I was very excited to read the announcement of Amazon Glacier earlier today and signed up when I had a few minutes last night.

At first, it seems perfect. $0.01 US per GB / month seems crazy cheap. I love the idea of only paying for what you use, look at that table for potential storage costs.
How could you really go wrong at 100 GB for $1.00?
First thing that went wrong was I didn't know was there is no client! The only way to get data into Glacier is via API calls. I'm sure that existing clients that support S3 will adapt to include Glacier support as well, it's early days.
I thought about rolling my own, but Amazon has Glacier SDKs for Java and .NET, so no quick OS X app could be cobbled together to get my data in. I mean I work with REST APIs all day long, and Glacier has that too, but I wasn't looking of that kind of work last night.
While doing a last ditch search for a client that already supported Glacier, I stumbled on the Hacker News post Beware that retrieval fee! I hadn't paid much attention to the retrieval fees when I activated my account. I'm thinking of Glacier as an emergency, the house burned down, kind of retrieval situation. So how much would that scenario cost?
That's where things get…complicated. How can Amazon offer a storage service and not provide a calculator or spreadsheet that helps customers estimate their costs? Seems like Amazon is hiding the true retrieval costs because, well, look at the math I came up with.


If my math is right, those download costs sure add up quick. Calculating this stuff appears intentionally very tricky. The Paid Retrieval columns represent my best guess based on the information I found, but I could totally be wrong. I tried to use the Glacier FAQ Formulas to work it out, but its crazy complicated and written mostly as prose! After I did the formulas one way I thought could be right, I re-read all the discussion and theories on formulas on the Hacker News thread, then found this Wired article. In the Update section, Amazon lays out a completely other formula for the Billable Peak column if you're downloading your whole archive. I used that because it was easier and I think it fits the scenario I'm looking for better, which is immediate disaster recovery. If you've lost everything, you don't want to trickle download your archive to stay under the GBs / hour column.
If you want to play with the math yourself, you can use the spreadsheet I started with:

Amazon Glacier Pricing Math - Numbers
Amazon Glacier Pricing Math - Excel