Search this site


Metadata

Articles

Projects

Presentations

EC2 reserved vs on-demand costs (and R graphs!)

I'm sure this is covered well elsewhere online, but that's never the point of these things ;)

I was helping with some capacity planning and run-rate math today at work and found that ec2 reserved instances are much cheaper compared to on-demand - If this is obvious to you, chill out, I have historically never really used EC2 nor have I ever been close to budgeting. ;)

I proved this conclusion with some math, but frankly I like visualizations better, so I decided to learn R. I wrote an R script that will graph an on-demand vs reserved pricing for one m1.large instance (code at end of the post).

The result is this graph:

The graph says it all, and definitely tells me that we need to be reserving all of our instances at Loggly - and it gives me a rule-of-thumb:

  • If we're going to use one instance unit for at least 9 months, reserve for 3 years.
  • If we're going to use one instance unit for at least 6 months, reserve for 1 year.
  • Otherwise, stick with on-demand.
The "reserved instances" pay structure is you pay a one-time fee for access to a reduced hourly rate.

This also means that our random "debug something" deployments that are shutdown much of the time are probably best off being reserved instances as well- at least for a 1-year thing - since we are likely to use those deployments for more than half of a year.

A 3-year on-demand price for m1.large is just shy of $9000, which is twice as expensive as the 3-year reserve. Capaticy plan and maybe start buying reserved instances. Make your CFO happy.

And in case you were going to ask, I ran the same plot with data from EC2 "quaduple extra large" instances and the savings and break-even points were the same. I bet the rest of the prices flow similarly.

The R script is follows, run it with 'R --save < yourscript.r':

# Values taken from http://aws.amazon.com/ec2/pricing/
# for an m1.large ("Large") instance
on_demand_hourly = 0.34
reserve_hourly = 0.12
reserve_1year = 910       
reserve_3year = 1400

# quadruple extra large instances
#on_demand_hourly = 1.60
#reserve_hourly = 0.56
#reserve_1year = 4290
#reserve_3year = 6590

on_demand_daily = on_demand_hourly * 24
reserve_daily = reserve_hourly * 24
x <- c(0, 365)
y <- on_demand_daily * x

# Calculate day of break-even point reserve vs on-demand rates
break_1year_x = reserve_1year / (on_demand_daily - reserve_daily)
break_3year_x = reserve_3year / (on_demand_daily - reserve_daily)

png(filename = "ec2_m1large_cost.png", width = 500, height=375)
plot(x,y, type="l", col='red', xlab="", ylab="cost ($USD)")
title("EC2 cost analysis for m1.large", 
      sprintf("(days)\n1-year is cheaper than on-demand after %.0f days of usage,\n 3-year is cheaper after %.0f days", break_1year_x, break_3year_x))
text(60, 0, sprintf("on-demand=$%.2f/hour", on_demand_hourly), pos=3)

abline(reserve_1year, reserve_daily, col='green')
text(60, reserve_1year, sprintf("1-year=$%.0f+$%.2f/hour", reserve_1year, reserve_hourly), pos=3)

abline(reserve_3year, reserve_daily, col='blue')
text(60, reserve_3year, sprintf("3-year=$%.0f+$%.2f/hour", reserve_3year, reserve_hourly), pos=3)

point_y = reserve_1year + reserve_daily * break_1year_x
points(break_1year_x, point_y)
text(break_1year_x, point_y, labels = sprintf("%.0f days", break_1year_x), pos=1)

point_y = reserve_3year + reserve_daily * break_3year_x
points(break_3year_x, point_y)
text(break_3year_x, point_y, labels = sprintf("%.0f days", break_3year_x), pos=1)

dev.off()
quit()

Fedora 6, utmp growth, Amazon EC2

% ls -l /var/run/[wu]tmp
-rw-rw-r-- 1 root  utmp  364366464 Aug 13 22:00 utmp
-rw-rw-r-- 1 root  utmp  1743665280 Aug 13 22:10 wtmp
That's 350 megs and 1.7 gigs. Cute. Performance sucks for anything needing utmp (w, uptime, top, etc). The 'init' process is spending tons of time chewing through cpu. System %cpu usage says 38% and is holding there on a mostly idle machine.

Lots of these in /var/log/messages:

Aug 13 22:10:27 domU-XX-XX-XX-XX-XX-XX /sbin/mingetty[6843]: tty3: No such file or d
irectory
Aug 13 22:10:27 domU-XX-XX-XX-XX-XX-XX /sbin/mingetty[6844]: tty4: No such file or d
irectory
Aug 13 22:10:27 domU-XX-XX-XX-XX-XX-XX /sbin/mingetty[6845]: tty5: No such file or d
irectory
Aug 13 22:10:27 domU-XX-XX-XX-XX-XX-XX /sbin/mingetty[6846]: tty6: No such file or d
irectory
Aug 13 22:10:32 domU-XX-XX-XX-XX-XX-XX /sbin/mingetty[6847]: tty2: No such file or d
irectory
I'm not sure why /dev/tty1 is the only /dev/ttyN device, but whatever. Either way, mingetty flapping will flood /var/run/[ubw]tmp over the span of weeks and eventually you end up with a system that spends most of its time parsing that file and/or restarting mingetty.

I fixed this by commenting out all tty entries in /etc/inittab and running "init q":

# Run gettys in standard runlevels
#1:2345:respawn:/sbin/mingetty tty1
#2:2345:respawn:/sbin/mingetty tty2
#3:2345:respawn:/sbin/mingetty tty3
#4:2345:respawn:/sbin/mingetty tty4
#5:2345:respawn:/sbin/mingetty tty5
#6:2345:respawn:/sbin/mingetty tty6

Pulling album covers from Amazon

Amazon provides lots of web services. One of these is it's E-Commerce API which allows you to search it's vast product database (among other things).

In Pimp, the page for any given listening station shows you the current song being played. Along with that, I wanted to provide the album cover for the current track.

You can leverage Amazon's API to search for a given artist and album eventually leading you to the picture of the album cover. To this end, I wrote a little python module that lets you search for an artist and album name combination and will give you a link to the album cover.

So, I wrote albumcover.py as a prototype to turn an artist and album into a url to the album cover image. It works for the 20 or so tests I've put through it.