Search this site


Page 1 of 2  [next]

Metadata

Articles

Projects

Presentations

Ruby metaprogramming will cost you documentation.

Ruby, like many other dynamic and modern languages, makes it easy for you to do fun stuff like metaprogramming.

Ruby, also like other nice languages, comes with a builtin documentation generator that scans your code for comments and makes them available in html and other formats.

... until you start metaprogramming.

Take a simple example, the Fizzler! The name of the class is unimportant; this class will simply provide a new way to define methods, simply for the sake of showing some metaprogramming and how ruby's rdoc fails on it.

class Fizzler
  def self.fizzle(method, &block)
    self.class_eval do
      define_method method, &block
    end
  end
end

class Bar < Fizzler
  # Print a worldly message
  fizzle :hello do
    puts "hello world!"
  end
  
  # A simple test
  def test
    puts "testing, 1 2 3!"
  end
end

# Now some sample code, let's invoke the new 'hello' method we generated with
# 'fizzle'.
bar = Bar.new
bar.hello
The output looks like this:
% ruby fizzler.rb 
hello world!
All is well! We are generating new methods on the fly, etc etc, all features of metaprogramming. However, we can never make this 'hello' method obviously available to the world via rdoc, at least as far as I can tell. The rdoc generated looks like this:

Note the lack of any mention of 'hello' as a method. I cannot simply do what works for lots of other normal ruby code and ask for the documentation of hello by running 'ri Bar#hello' - because rdoc simply doesn't see it.

I recall in python, if you were dynamically generating methods and classes, you could also inject their documentation by simply setting the '__doc__' property on your class or method. Ruby doesn't appear to have such a thing.

Additionally, in some metaprogramming cases, the stack traces are actually harder to read. For example, ActiveRecord makes extensive use of 'method_missing' rather than dynamically generate methods. The output is the same, but the stacktraces are now littered with 'method_missing' and references to files and lines you don't own, rather than containing stacktraces to named functions and other useful pointers. This perhaps is a feature, but for cases like method_missing, being able to add other useful data onto the stack trace would greatly aid in debugging.

So, if long term necessities like documentation and easy debuggability (stack traces, etc), are hindered by metaprogramming, at least in ruby, what are we left to do? Metaprogramming is clearly a win in some places, but the automatic losses seem to detract from any value it may have.

Bringing test tools to Nagios monitoring

With all the TDD (test-driven design) and BDD (behavior-driven design) going around these days, it'd be a shame not to use these tools on monitoring applications.

You might have a boatload of tests that test your application before you roll a new version, but do you use those tests while the application is in production? Can you? Yes!

Let's take an important example of monitoring some complex interaction, like searching google and checking the results. Simple with a mouse, but perhaps complex in code. Even if you wrote a script to do it, using an existing testing framework gets you pass/fail testing automatically.

For this example, I'll use the following ruby tools: rspec and webrat. This fairly easy, though it took me a bit to find all the right documentation bits to clue me in to the right way.

require 'rubygems'
require 'webrat'

Spec::Runner.configure do |config|
  include Webrat::Methods
end

describe "google search for my name" do
  it "should include semicomplete.com in results" do
    visit "http://www.google.com/"
    webrat.response.title.should =~ /Google/
    query = "jordan sissel"
    fill_in "q", :with => query
    field_named("btnG").click
    webrat.response.title.should == "#{query} - Google Search"
    click_link "semicomplete.com"
  end
end
Now, we run this with the 'spec' tool:
% spec rspec-webrat.rb 
.

Finished in 0.578546 seconds

1 example, 0 failures
Seems ok. Let's break the test and see what happens. Change the 'visit' line to something else:
    visit "http://www.yahoo.com/"
Now rerun the test, which was checking specifically for google things in the page and will now fail on yahoo's page:
 % spec rspec-webrat.rb
F

1)
'google search for my name should include semicomplete.com in results' FAILED
expected: /Google/,
     got: "Yahoo!" (using =~)
./rspec-webrat.rb:29:

Finished in 0.186847 seconds

1 example, 1 failure
This output kind of sucks. Additionally, rspec failures seem to have exit code 1, not 2 as wanted by a nagios check reporting critical. Let's fix those. First, fixing the exit code can be hacked around directly in ruby if you want:
# Nagios checks expect exit code '2' to mean CRITICAL.
# Let's make any nonzero exit attempt always exit 2 (EXIT_CRITICAL).
EXIT_CRITICAL = 2
module Kernel
  alias :original_exit :exit
  def exit(value)
    value = EXIT_CRITICAL if value != 0
    original_exit(value)
  end
end
Fixing the output just means telling spec to use a different output format. I like the 'nested' output. Rerun that test now:
% spec -f nested rspec-webrat.rb
google search for my name
  should include semicomplete.com in results (FAILED - 1)

1)
'google search for my name should include semicomplete.com in results' FAILED
expected: /Google/,
     got: "Yahoo!" (using =~)
./rspec-webrat.rb:30:

Finished in 0.017534 seconds

1 example, 1 failure

% echo $?
2
All set.

Even better is that you can include multiple checks in the same script, if you wanted to. RSpec lets you select any test to run alone, so your nagios checks for a given web application could be a very simple:

define command {
  command_name check_google_for_semicomplete
  command_line /usr/bin/spec -f nested -e "google search for my name" mytests.rb
}

Ruby: Finding subclasses in your world

Use the ObjectSpace class to find all ancestors of a given class.
class Foo; end
class Bar < Foo; end
class Baz < Foo; end

subclasses = ObjectSpace.each_object(Class).select do |klass|
  klass.ancestors.include?(Foo) and klass != Foo
end

# prints "[Baz, Bar]"
puts subclasses
Of course, you could always override Class#inherited instead, but if you don't want to override methods, the above is a reasonable choice.

Ruby's DateTime::strptime vs libc strptime

A project I'm working on has some odd slowness about it. Using ruby-prof, I found that String#scan was consuming most of the time, but ruby-prof didn't tell me where it was coming from. A quick hack that replaced String#scan with my own method showed who was calling it, DateTime.strptime -
class String
  def scan(*args)
    raise
  end
end
I tried using the ruby debugger to break on String#scan, but it didn't seem to work. PEBCAK, probably, which is why I used the solution above to just toss an exception when that function was called.

Back at the point, DateTime.strptime is slow. Looking at the underlying code shows you why: date/format.rb - the _strptime_i method.

Lots of string shuffling, regular expressions to match field specifiers (%d, etc), string modification with more regexps, etc. The code is pretty easy to read, but it's still doing a lot of work it doesn't need to be doing. Luckily, libc comes with a method for parsing times in the same way: strptime.

So, I started working on an extension to the Time class that invokes libc's strptime and returns a Time instance: ruby-ctime. The usage is simple once you have the module:

require "CTime"

puts Time.strptime("%Y", "2009")
# outputs 'Wed Jan 00 00:00:00 +0000 2009'
The one major holdback from strptime is that there's no wide support for timezones. Format strings like %Z and %z work with strftime, but generally are unsupported by strptime; exceptions that do support %z are glibc, and freebsd appears to support both %Z and %z. Nothing reliably cross-platform. This is a historical problem due to the fact that the 'struct tm' structure has no timezone field (glibc and the bsds add 'long tm_gmtoff' to support timezones).

This means we'll have to correct for this by extending strptime to support it, but I'm not there yet.

Anyway, short benchmarking for features supported by both libc strptime and DateTime strptime shows libc a massive winner:

snack(~/projects/ruby-ctime) % ruby test.rb
Iterations: 10000
datetime: 7.680928 (1301.92601727291/sec)
my_strptime: 0.126583 (78999.5497025667/sec)
A 60x speedup using the new C code vs DateTime.strptime. This is a great start, but we still need timezone support. I need to hack timezone support into this, which probably means I'll start with glibc's strptime implementation.

Ruby Net::IMAP and Exchange

Exchange's server-side filters are pretty weak, so I decided to work around them by writing a tool that will fix my inbox and filter mail appropriately so that any client I use to view mail with (OWA, whatever) has the same view with no client-local filters. It's likely/possible there's already a tool that does this; let's ignore that possibility for now.

Ruby comes with Net::IMAP, but it doesn't come with an authenticator that supports 'PLAIN' auth, so we have to provide one:

# Learned the 'PLAIN' expected format from imapsync.
class PlainAuthenticator
  def process(data)
    # Net::IMAP takes care of base64 encoding the result of this...
    return "#{@user}\0#{@user}\0#{@password}"
  end
  
  def initialize(user, password)
    @user = user
    @password = password
  end
end

Net::IMAP::add_authenticator('PLAIN', PlainAuthenticator)
Now that we have that, let's try connecting.
imap = Net::IMAP.new("exchange.example.com", "imaps", usessl=true)
imap.authenticate("PLAIN", user, passwd)
This fails, because Exchange's IMAP server ignores the RFC:
/usr/lib/ruby/1.8/net/imap.rb:3122:in `parse_error': unexpected token CRLF (expected SPACE) (Net::IMAP::ResponseParseError)
        from /usr/lib/ruby/1.8/net/imap.rb:2974:in `match'
        from /usr/lib/ruby/1.8/net/imap.rb:1959:in `continue_req'
        from /usr/lib/ruby/1.8/net/imap.rb:1946:in `response'
...
Expected a space, not a crlf. The failure is in continue_req, which expects what the RFC says:
continue_req    ::= "+" SPACE (resp_text / base64)
However, Exchange's IMAP server doesn't send a space after the plus. Great, let's fix that by overriding the continue_req method:
# Copied/modified from net/imap.rb, don't modify that file, put this
# in your own code to override the continue_req method
module Net
  class IMAP
    class ResponseParser
      def continue_req
        match(T_PLUS)
        #match(T_SPACE)   # Comment this line out to not expect a space.
        return ContinuationRequest.new(resp_text, @str)
      end
    end
  end
end
Once you've done that, everything else seems to work normally. I have only tested listing mail folders thus far, but the hacks above allow you to get this far.

More rmagick/rvg playtime

While working on graphs tonight, I found that calculation and labelling of ticks should be provided by special 'tick' classes. The iterator (the 'each' method) takes a min and max value and yields ticks in that range. This allows you to:
  • A 'tick' provider should just be an iterable class (foo.each, etc) which provides the tick position and optional label.
  • A graph can have multiple ticks per axis, allowing you to have 'major' ticks labeled while 'minor' ticks are not labeled, and even more than two layers of ticks on each axis.
  • The same tick classes can easily be used to draw both the graph ticks and the grid.
  • Trivially have 'time format' tickers
  • Have a 'smart time ticker' that looks at the min/max and determines the correct set of time ticks to display (display format, tick distance, tick alignment, etc). Can use multiple 'time ticker' instances internally (code reuse!)
I'm sure this has all been though of before, but it's a research experience for me :)

At any rate, I'm finding myself wondering if RMagick/rvg is really the right tool. It certainly makes doing graphics trivial, but even for what I see as a simple graph it takes a little over a second to render, which would hurt usability if multiple graphs needed rendering simultaneously.

The bottleneck seems to be with text rendering. If I disable text display in the graph (tick labels, etc), graph rendering drops by 0.5 seconds (from 1.1). Switching from 'gif' to 'png' output shaved 0.2 seconds on average of rendering, which is interesting.

Today's results, with real data:

graph = RPlot::Graph.new(400, 200, "Ping results for www.google.com")
pingsource = RPlot::ArrayDataSource.new
File.open("/b/pingdata").each do |line|
  time,latency = line.split
  pingsource.points << [time.to_f, latency.to_f]
end
pingsource.points = pingsource.points[-300..-1]
graph.sources << pingsource
graph.xtickers << RPlot::SmartTimeTicker.new
graph.ytickers << RPlot::LabeledTicker.new(alignment=0, step=25)
graph.render("test.png")

Graphs in Ruby with RMagick

I'm always finding myself wanting to graph random data. Gnuplot is nice, but not enjoyably scriptable. Matplotlib in python is too matlab-ish, or was when I looked at it last (though it looks much improved now). Some ruby options exist (even ruby+gnuplot), but none were much to my tastes.

I started fiddling around with RMagick and stumbled across what it calls "RVG" (ruby vector graphics). From the site:

RVG (Ruby Vector Graphics) is a facade for RMagick's Draw class that supplies a drawing API based on the Scalable Vector Graphics W3C recommendation.
The API is pretty reasonable and hasn't hindered me yet and feels good after having hacked with it for a few hours: Simple operations like point translation, scaling, rotating, flipping, etc are simple in code; the api is well documented; images can be embedded easily into another which allows for easily writing modular code.

Anyway, the goal of this adventure was to come up with something that would produce non-crappy plots. Main emphasis on having a means to apply axis labels and ticks that wasn't painful. The result is below: (x-axis ticks are hour-aligned and have 12 hour steps, y-axis ticks are single-value aligned)

Here's the code that generates the above graph (using rplot.rb). A lot of things (like axis label tick alignment and stepping) are hardcoded right now, but that will obviously change if I decide this project needs attention (and I don't find something that does the same thing but better).

# graph some random stuff, like log(x) and sin(x)
# use time for the 'x' to demo time formatting
# each point is an hour (i * 3600)
graph = RPlot.new(400, 200, "Happy Graph")

points = 60
axis = GraphAxis.new
(1..points).each do |i|
  axis.points << [Time.now.to_f + i*3600, Math.log(i)]
end

axis2 = GraphAxis.new
(1..points).each do |i|
  axis2.points << [Time.now.to_f + i*3600, Math.sin((i / 2.0).to_f) + 1]
end

graph.axes << axis
graph.axes << axis2

graph.render("/home/jls/public_html/test.gif")

How not to write documentation.

I've grown accustomed to ruby having poorly accessible documentation. What I mean by 'poorly accessible' is the fact that ri Array gives me a list of things Array can do (which is nice), but to actually find out about Array.delete I have to run ri Array.delete. Further, the online ruby documentation is better, but not great, but is somehow strangely different from simply the rdoc itself. I got used to Python's often-helpful ways of pydoc showing you what appears to be "as much as possible" when you pydoc a class, module, or method.

Maybe I'm doing it wrong. Either way, the following is annoying and unhelpful. While it tells me the arguments that should be passed, and what is returned, it doesn't help me really know more about the function. Luckly, I'm already familiar with select from other languages.

snack(~) % ri IO.select
------------------------------------------------------------- IO::select
     IO.select(read_array 
     [, write_array 
     [, error_array 
     [, timeout]]] ) =>  array  or  nil
------------------------------------------------------------------------
     See +Kernel#select+.
Ok, fine... Let's look at Kernel#select.
snack(~) % ri Kernel#select
---------------------------------------------------------- Kernel#select
     IO.select(read_array 
     [, write_array 
     [, error_array 
     [, timeout]]] ) =>  array  or  nil
------------------------------------------------------------------------
     See +Kernel#select+.
*sigh*

Riding on the failboat: Ruby, episode 1.

I'm quickly learning ruby at my new job. Today's POLA (principle of least astonishment) violation is blamed on my expectations of 'if' behavior in Ruby from what I know from Python, C, and other languages.

Non-nil values are considered true:

["0", [], {}, "", 0, true, false, nil].each { |x|
  bool = (x) ? true : false
  puts "#{x.inspect}: #{bool}"
}

"0": true
[]: true
{}: true
"": true
0: true
true: true
false: false
nil: false
I mostly expected every one of the outputs here except for the literal 0 value being true. Noted for future reference.

Additionally confusing, is that Integer() will barf on most non-number inputs, but for some reason "nil" means 0.

irb(main):002:0> Integer(nil)
=> 0
Unexpected.

Ruby/Oniguruma code block patches

I love perl's (?{ code }) feature. I want it in other languages.

I spent some time on hacking this into ruby a few weeks ago. I finally got around to making patches.

In FreeBSD ports, I select to build Ruby 1.8.6 with oniguruma for the regex engine. After doing 'make configure' you can apply these patches:

I haven't tested this on other platforms, and it's not feature complete, but it's close.