Search this site


Metadata

Articles

Projects

Presentations

Oniguruma - named capture example

For whatever reason, I decided to play with oniguruma tonight (a newish regular expression library). I'm considering an effort to port some of grok's functionality to C or C++ for speed reasons. Doing it in C++ would require me to re-learn C++.

The docs are pretty complete, but not very helpful with respect to examples. I wasn't able to find very many useful examples on google, but the API docs are quite good. What wasn't answered by the docs was answered by reading header files. Excellent.

The result of this adventure is this:

# regex: ^(?<test>.*?)( (?<word2>.*))?$
# input: "hello there"

% gcc -I/usr/local/include -L/usr/local/lib -lonig oniguruma_named_captures.c
% ./a.out "hello there"
word2 = there
test = hello
% ./a.out "foobarbazfizz"
word2 = 
test = foobarbazfizz

Download the code

Ruby/Oniguruma hacking

Last night, I mentioned that I wanted (?{ code }) in ruby and python.

I got bored tonight and decided to see how hard this would be to implement in ruby. Turns out it's not as bad as I thought, not that I'm finished yet.

This ruby script shows a demo of what I have so far. The output is in comments in the script. There's a few strange bugs yet, but I've nearly got it working properly. Something about my coding or the way oniguruma does backtracking/failures keeps this from working correctly on strings with multiple potential matches.