Ruby's DateTime::strptime vs libc strptime
Posted Sat, 12 Sep 2009
class String
def scan(*args)
raise
end
end
I tried using the ruby debugger to break on String#scan, but it didn't seem to
work. PEBCAK, probably, which is why I used the solution above to just toss an
exception when that function was called.
Back at the point, DateTime.strptime is slow. Looking at the underlying code shows you why: date/format.rb - the _strptime_i method.
Lots of string shuffling, regular expressions to match field specifiers (%d, etc), string modification with more regexps, etc. The code is pretty easy to read, but it's still doing a lot of work it doesn't need to be doing. Luckily, libc comes with a method for parsing times in the same way: strptime.
So, I started working on an extension to the Time class that invokes libc's strptime and returns a Time instance: ruby-ctime. The usage is simple once you have the module:
require "CTime"
puts Time.strptime("%Y", "2009")
# outputs 'Wed Jan 00 00:00:00 +0000 2009'
The one major holdback from strptime is that there's no wide support for
timezones. Format strings like %Z and %z work with strftime, but generally are
unsupported by strptime; exceptions that do support %z are glibc, and freebsd
appears to support both %Z and %z. Nothing reliably cross-platform. This is a
historical problem due to the fact that the 'struct tm' structure has no
timezone field (glibc and the bsds add 'long tm_gmtoff' to support timezones).
This means we'll have to correct for this by extending strptime to support it, but I'm not there yet.
Anyway, short benchmarking for features supported by both libc strptime and DateTime strptime shows libc a massive winner:
snack(~/projects/ruby-ctime) % ruby test.rb Iterations: 10000 datetime: 7.680928 (1301.92601727291/sec) my_strptime: 0.126583 (78999.5497025667/sec)A 60x speedup using the new C code vs DateTime.strptime. This is a great start, but we still need timezone support. I need to hack timezone support into this, which probably means I'll start with glibc's strptime implementation.