Search this site

[prev]  Page 2 of 54  [next]





EC2 reserved vs on-demand costs (and R graphs!)

I'm sure this is covered well elsewhere online, but that's never the point of these things ;)

I was helping with some capacity planning and run-rate math today at work and found that ec2 reserved instances are much cheaper compared to on-demand - If this is obvious to you, chill out, I have historically never really used EC2 nor have I ever been close to budgeting. ;)

I proved this conclusion with some math, but frankly I like visualizations better, so I decided to learn R. I wrote an R script that will graph an on-demand vs reserved pricing for one m1.large instance (code at end of the post).

The result is this graph:

The graph says it all, and definitely tells me that we need to be reserving all of our instances at Loggly - and it gives me a rule-of-thumb:

  • If we're going to use one instance unit for at least 9 months, reserve for 3 years.
  • If we're going to use one instance unit for at least 6 months, reserve for 1 year.
  • Otherwise, stick with on-demand.
The "reserved instances" pay structure is you pay a one-time fee for access to a reduced hourly rate.

This also means that our random "debug something" deployments that are shutdown much of the time are probably best off being reserved instances as well- at least for a 1-year thing - since we are likely to use those deployments for more than half of a year.

A 3-year on-demand price for m1.large is just shy of $9000, which is twice as expensive as the 3-year reserve. Capaticy plan and maybe start buying reserved instances. Make your CFO happy.

And in case you were going to ask, I ran the same plot with data from EC2 "quaduple extra large" instances and the savings and break-even points were the same. I bet the rest of the prices flow similarly.

The R script is follows, run it with 'R --save < yourscript.r':

# Values taken from
# for an m1.large ("Large") instance
on_demand_hourly = 0.34
reserve_hourly = 0.12
reserve_1year = 910       
reserve_3year = 1400

# quadruple extra large instances
#on_demand_hourly = 1.60
#reserve_hourly = 0.56
#reserve_1year = 4290
#reserve_3year = 6590

on_demand_daily = on_demand_hourly * 24
reserve_daily = reserve_hourly * 24
x <- c(0, 365)
y <- on_demand_daily * x

# Calculate day of break-even point reserve vs on-demand rates
break_1year_x = reserve_1year / (on_demand_daily - reserve_daily)
break_3year_x = reserve_3year / (on_demand_daily - reserve_daily)

png(filename = "ec2_m1large_cost.png", width = 500, height=375)
plot(x,y, type="l", col='red', xlab="", ylab="cost ($USD)")
title("EC2 cost analysis for m1.large", 
      sprintf("(days)\n1-year is cheaper than on-demand after %.0f days of usage,\n 3-year is cheaper after %.0f days", break_1year_x, break_3year_x))
text(60, 0, sprintf("on-demand=$%.2f/hour", on_demand_hourly), pos=3)

abline(reserve_1year, reserve_daily, col='green')
text(60, reserve_1year, sprintf("1-year=$%.0f+$%.2f/hour", reserve_1year, reserve_hourly), pos=3)

abline(reserve_3year, reserve_daily, col='blue')
text(60, reserve_3year, sprintf("3-year=$%.0f+$%.2f/hour", reserve_3year, reserve_hourly), pos=3)

point_y = reserve_1year + reserve_daily * break_1year_x
points(break_1year_x, point_y)
text(break_1year_x, point_y, labels = sprintf("%.0f days", break_1year_x), pos=1)

point_y = reserve_3year + reserve_daily * break_3year_x
points(break_3year_x, point_y)
text(break_3year_x, point_y, labels = sprintf("%.0f days", break_3year_x), pos=1)

Introducing FPM - Effing Package Management

Having become fed up with dealing with rpmbuild, spec files, debian control files, dh_make, debuild, and the whole lot, I automated my way back to sanity.

The result is a tool I call "fpm" which aims to help you make and mangle packages however you choose, all (ideally) without having to care about the internals of your particular native package format.

The goal of this project is not to undermine upstream packaging but to grant everyone the ability to trivially build and edit packages. Why? Not all software is packaged. Not all software of the version you want is packaged. And further, not all users are willing or able to take the time to learn all the ins and outs of their package build tools.

For example, you can package up your /etc/init.d directory as an RPM by doing simply this:

% fpm -s dir -t rpm -n myinitfiles -v 1.0 /etc/init.d
Created /home/jls/rpm/myinitfiles-1.0.x86_64.rpm
fpm will create a simple package for you and put it in your current directory. The result:
% rpm -qp myinitfiles-1.0.x86_64.rpm -l

% rpm -qp myinitfiles-1.0.x86_64.rpm --provides
myinitfiles = 1.0-1
% rpm -qp myinitfiles-1.0.x86_64.rpm --requires
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
You can package up any directory. But there's more.

Above, I didn't specify a package summary, so how about fixing the rpm to include the description? You can use RPMs as the source (-s flag) in fpm. There's also a helpful '-e' (--edit) flag that'll let you edit the rpm spec (or debian control) file before building.

% rpm -qp myinitfiles-1.0.x86_64.rpm --info | grep Summary
Summary     : no summary given

% fpm -s rpm -t rpm -e myinitfiles-1.0.x86_64.rpm
... this opens up $EDITOR so you can edit the spec file it generated ...
... make some changes to the spec, including adding a proper 'Summary' ...
Created /home/jls/rpm/myinitfiles-1.0-1.x86_64.rpm

% rpm -qp myinitfiles-1.0-1.x86_64.rpm --info | grep Summary
Summary     : my /etc/init.d directory
The '-s dir' flag says the source of the package is a directory. There's also support for other package sources like rubygems, other rpms, debs, and more on the way.

With FPM, you can specify dependencies, architecture, maintainer, etc. All from a simple command line, and never forcing you to learn the pain and suffering that can come with rpm spec files or debian package building.

You can install fpm with: gem install fpm

The project page is here:

The wiki is here (has more examples):

jQuery Mobile - Full Height content area

I was working on integrating jQuery Mobile stuff into fingerpoken and needed a way to make the content area of pages full-screen. By 'full screen' I mean still showing the header and footer, but otherwise the content needs to fill the rest.

I couldn't find an easy way to do this while googling, and even the jQuery Mobile demos didn't do it.

So here's a demo of what I came up with here: fullheight jQuery Mobile demo Javascript:

  var fixgeometry = function() {
    /* Some orientation changes leave the scroll position at something
     * that isn't 0,0. This is annoying for user experience. */
    scroll(0, 0);

    /* Calculate the geometry that our content area should take */
    var header = $(".header:visible");
    var footer = $(".footer:visible");
    var content = $(".content:visible");
    var viewport_height = $(window).height();
    var content_height = viewport_height - header.outerHeight() - footer.outerHeight();
    /* Trim margin/border/padding height */
    content_height -= (content.outerHeight() - content.height());
  }; /* fixgeometry */

  $(document).ready(function() {
    $(window).bind("orientationchange resize pageshow", fixgeometry);

Introducing: fingerpoken - a mobile device as a touchpad/remote/keyboard

I'm giving a presentation this week at the Puppet Bay Area user meetup and while working on slides, I wanted to be able to present while not being attached to my laptop.

Enter: fingerpoken

Fingerpoken lets you turn your iphone/ipad/itouch into a touchpad, keyboard, and remote for another computer. The only required piece on your iphone is Safari. No appstore stuff to download!

Under the hood, it uses websockets and touch events and sends JSON-encoded requests to your workstation and will move the mouse, type, scroll, and more.

Project page: fingerpoken on github.

A short demonstration of this project in action:

SysAdvent 2010 now online!

Today starts the 3rd year of the SysAdvent calendar! How time flies!

What is SysAdvent? It's a 24-day event where I publish one excellent sysadmin article each day, starting December 1st. The articles are written by fellow sysadmins around the world.

Planning for this year has been quite a success. Many folks have contributed finished articles or drafts already and more have committed to writing about a topic. This is a huge step for the SysAdvent project.

SysAdvent is for sysadmins, so please share sysadvent with your coworkers, reddit, digg, twitter, and any other places with sysadmin communities.

The first article for this year is about Linux Containers (LXC). Go, read! Add sysadvent to your rss reader, too :)

Also, there are still 50ish articles from the past two years, all quite good. Have a look at the 2008 and 2009 years, too!

Puppet "pure fact-driven" nodeless configuration

Truth should guide your configuration management tools.

Truth in this case is: what machines you have, properties of those machines, roles for those machines, etc. For example " is a webserver" is a piece of truth. Where and how you store truth is up to you and out of scope for this post.

My goal is to have truth steer everything about my infrastructure. Roles, jobs, and even long-term one-offs get put into the truth source (like a machine role, etc). That way, if the machine with a one-off dies, I can just add that machine role to another system and puppet will configure it - no pain and no fire to fight.

A simple example of truth in puppet is puppet's "node" type. A simple example:

node "" {
  include apache
Specifying each node in puppet doesn't scale very well for many people. Further, you may already have your node information in another tool (ldap, mysql, etc). To allow this, puppet lets you feed 'node' information from an external tool (called an external node classifier). However, I found that using an external node classifier also has its drawbacks (also out of scope for this post).

To avoid complex logic in a node classifier, I've got with a pure fact-based puppet configuration which I call "nodeless."

My puppet site.pp looks basically like this:

node default {
  include truth::enforcer
That's it. No extra nodes, no 'include' or conditionals randomly sprayed around.

From there, I have the truth::enforcer class include other classes, do sanity checking, etc, all based on facts (and properties if you use external node classifier)

Fully standalone example with code can be found here:

Thoughts and a crazy prediction on ipv6

Every few months there's a new (or old!) article that floats around the echochamber of blogs and twitter that lays predictions about the rate of IPv4 address consumption and targets our iminent destruction. People parrot off quotes from this article, dates, etc. It's amusing to me given the frequency in which new posts occur compared to the inaction.

This post isn't about predicting when or why we should care, because that's such a boring, dead horse, I'd rather not kick it. Poor little horsey.

Realistically, I think the whole v4 to v6 transition is going to be pretty lame and unexciting. Existing owners of IP space probably won't care and will move to IPv6 when the business need demands it. I want excitement, fireworks, and explosions. That's what makes movies awesome, isn't it? That's what keeps Michael Bay in business, anyway.

So let's talk about explosions. Our story begins.

Week 1: We're out of IP space. Shit hits the fan. No one really understands what IPv4 or IPv6 is, or why we should care, but we're out of it, so there's a media frenzy. Stories on the front page of the Wall Street Journal declare that ecommerce is dead due to this mysterious affliction called "eye pee vee four". Someone makes a crappy metaphor of a mall being out of retail spaces. C-level folks around the world panic: "We can't launch tomorrow! THE MALL IS FULL!!!" - misinformation leads decisions on product delays. Stock market takes a small dip. Nobody really knows what is going on, but everyone is sure the sky is falling. Rumors of undead walking in San Francisco spike on Twitter.

Week 2: CTOs do internal invesgitations to see how much IP space they own. The CFO gets wind that they have "thousands" of IP addresses and remembers last week's panic about how there are no more IP addresses, but doesn't really know what an "eye pee" is, but it sounds funny. An idea hits. IPv4 address space becomes a profit center. CFO presents to the board about sales/rental/lease models for IP addresses. There's a slide with graph on it that has an line that goes up and to the right.

Week 4: IT Operations is told by the exec team that priority #1 is to consolidate IP usage. No reason is given. Profit incentives feed IT management and pressure mounts on the operations team from the inside. Six IPv4 auction sites launch, including Godaddy. Three of the new sites have venture funding.

Week 6: C-level pressure to monetize this new scarce resource results in deals struck, contracts signed, and ultimately IP addresses sold. Except nobody involved knows how the internet works, so they sell individual addresses or otherwise small pieces here and there. Later in the week, eBay is updates their TOS to deny sale of /32 addresses.

Week 8: Cloud providers, thanks to CFO-profit-center drives, now start charging for public IP addresses.

Week 12: Due to poor communication, nobody involved in any of these single-ip-address sales has gotten the memo that for technology reasons you can't usually share routing announcements that small due to filtering, performance, and routing table size problems.

Week 14: Major network hardware vendors get wind of this new practice. Despite internal pressure against it, Cisco announces that new router upgrades and hardware are available to support the massive router tables expected. The new causes Cisco's stock to jump. Other major vendors follow with similar announcements.

Week 18: Thousands of small address spaces have been sold or rented. Companies are forced to advertise even smaller routes routes over BGP. Network transit customer support lines are full 24 hours a day with customers complaining that their new routing advertisments are not functioning. Routing table size grows by four times. God kills a kitten.

Week 20: Major routers everywhere are overwhelmed. Struggling to maintain active users, Facebook buys AOL for it's dialup service and announces a new feature called "facebook keywords". You smell something burning. The internet dies.

logstash is ready for use

I've talked for a while about logging problems. Parsing, storing, searching, reacting.

Today is the first release of logstash.

What is logstash? Free and open source log/event management and search. It's designed to scale and help you more easily manage your logs and provides you a great way to search them for debugging, postmortems, and other analysis.

You can read more about it on

Introducing: Ruby Minstrel - a method instrumenter

The following tools are awesome: strace, truss, ltrace, dtrace, and systemtap.

Sometimes I'm trying to debug a ruby library or application, and I end up monkeypatching things just to see what arguments are being passed as a away of sanity checking configuration or correctness. Other times I want to profile the time spent only in a certain class, or method, etc. At a basic level, I'm looking for a simple ltrace equivalent in ruby.

Enter minstrel. There may be projects out there already that do this, but I don't know of one, so it got written tonight.

You can 'gem install minstrel' to get it (or here (rubygems) and here (github))

My standard path of debugging (without other options) is to sanity check my code and then dive into the code for whatever app/library I am using. It often requires root access to modify ruby libs on the system, which sucks for one-off debugging. Writing up monkeypatches guessing at methods that I should inspect is error prone and sucky. Monkeypatching for debugging is common for me and is about as efficient/productive as using LD_PRELOAD to hack in my own method calls. (see liboverride). It sucks.

Minstrel is something better than a bag of the usual hope+monkeypatch+time combination. For example. Let's use minstrel to debug why mcollective's 'mc-ping' is failing:

snack(~) % /usr/sbin/mc-ping
connect failed: Connection refused - connect(2) will retry in 5
Ok, connection refused. To what? Yes, I could use strace or tcpdump to debug this particular issue. But that's not the point, here. After looking at the mcollective code for a few minutes I came up with a few classes I want to instrument.
snack(~) % % RUBY_INSTRUMENT="MCollective::Connector::Stomp,Stomp::Connection,TCPSocket" minstrel /usr/sbin/mc-ping
Wrap of TCPSocket successful
Wrap of Stomp::Connection successful
Wrap of MCollective::Connector::Stomp successful
enter MCollective::Connector::Stomp#connect([])
enter Stomp::Connection#socket([])
class_enter TCPSocket#open(["localhost", 6163])
class_exit_exception TCPSocket#open(["localhost", 6163])
connect failed: Connection refused - connect(2) will retry in 5

Puppet data into mcollective facts

There's a plugin for mcollective that lets you use puppet as your mcollective fact source. However, it doesn't seem to support the plugins-in-modules approach that puppet allows, and I don't want to have to regenerate and restart mcollective any time I add new fact modules.

Mcollective comes with a yaml fact plugin by default which will load facts from a yaml file of your choice. Exporting puppet facts in yaml format is super trivial during a puppet run:

  package {
    "mcollective": ensure => "0.4.10-1";

  file {
      ensure => file,
      content => inline_template("<%= scope.to_hash.reject { |k,v| !( k.is_a?(String) && v.is_a?(String) ) }.to_yaml %>"),
      require => Package["mcollective"];
Easy. Now each puppet run updates will dump it's fact/parameter knowledge to that file and mcollective can use those facts:
% mc-facts lsbdistrelease
Report for fact: lsbdistrelease                            

        10.04                                   found 18 times

Finished processing 18 hosts in 5533.86 ms
An added benefit of this is that any puppet variables (not just facts!) that are in scope are included in the yaml output. This lets you write "facts" to feed mcollective and puppet from plain puppet manifests. Awesome!

Update: Looks like there's a bug/feature somewhere that causes puppet to output yaml that mcollective can't handle due to sorting problems (like '!ruby/sym _timestamp'). To solve this, filter the scope hash for keys and values that are not strings. I have updated the code above to reflect this. Future mcollective releases will handle funky data more safely.