photo
Jordan Sissel
geek

Thu, 10 Jan 2008

Boost xpressive dynamic regexp with custom assertions

As it turns out, xpressive is (so far) exactly what I'm looking for.

'Dynamic regular expression' in Xpressive's docs are means that the regex object comes from compiling a regex string, not from using the static regular expression (aka coded in C++) that is the alternative. Very fortunately, you can mix the uses of dynamic and static expressions, since both end up turning into the same objects!

What I wanted was dynamic regexps with custom assertions, and here's how you do it:

struct is_private {
  bool operator()(ssub_match const &sub) const {
    /* Some test on 'sub' */
  }
};

/* somewhere in your code ... */
sregex ip_re = sregex::compile("(?:[0-9]+\\.){3}(?:[0-9]+)");
sregex priv_ip_re = ip_re[ check(is_private()) ];
This is excellent because this was one of the features of perl that kept me from making grok available in any other language.

I have a working demo you can download. I've tested on Linux and FreeBSD with success. It requires boost 1.34.1 and the xpressive 2.0.1. The version of xpressive that comes with boost 1.34.1 is insufficient, you must separately download the latest version of xpressive. I installed it by unzipping it and copying boost/xpressive/* to /usr/local/include/boost/xpressive/ - this overwrote the old copy of xpressive I had installed.

Compile with (on freebsd, the -I and :

g++ -I/usr/local/include -c -o boost_xpressive_test.o boost_xpressive_test.cpp
g++  boost_xpressive_test.o -o xpressivetest
Running it:
% ./xpressivetest 
RFC1918 test on '1.2.3.4': fail
RFC1918 test on '4.5.6.7': fail
RFC1918 test on '192.168.0.5': pass
Match on test1: 192.168.0.5
RFC1918 test on '129.21.60.0': fail
RFC1918 test on '29.21.60.0': fail
RFC1918 test on '9.21.60.0': fail
RFC1918 test on '172.17.44.25': pass
Match on test2: 172.17.44.25
This is exactly the behavior I expected.

Comments: 0 (view comments)
Tags: , , , ,
Permalink: /geekery/boost-xpressive-testing
posted at: 02:12

Mon, 07 Jan 2008

Oniguruma - named capture example

For whatever reason, I decided to play with oniguruma tonight (a newish regular expression library). I'm considering an effort to port some of grok's functionality to C or C++ for speed reasons. Doing it in C++ would require me to re-learn C++.

The docs are pretty complete, but not very helpful with respect to examples. I wasn't able to find very many useful examples on google, but the API docs are quite good. What wasn't answered by the docs was answered by reading header files. Excellent.

The result of this adventure is this:

# regex: ^(?<test>.*?)( (?<word2>.*))?$
# input: "hello there"

% gcc -I/usr/local/include -L/usr/local/lib -lonig oniguruma_named_captures.c
% ./a.out "hello there"
word2 = there
test = hello
% ./a.out "foobarbazfizz"
word2 = 
test = foobarbazfizz

Download the code

Comments: 0 (view comments)
Tags: , , , ,
Permalink: /geekery/oniguruma-named-capture-example
posted at: 04:41

Sun, 16 Sep 2007

Ruby/Oniguruma code block patches

I love perl's (?{ code }) feature. I want it in other languages.

I spent some time on hacking this into ruby a few weeks ago. I finally got around to making patches.

In FreeBSD ports, I select to build Ruby 1.8.6 with oniguruma for the regex engine. After doing 'make configure' you can apply these patches:

I haven't tested this on other platforms, and it's not feature complete, but it's close.

Comments: 2 (view comments)
Tags: , , , ,
Permalink: /geekery/ruby-oniguruma-codeblock-patches
posted at: 00:38

Wed, 27 Dec 2006

Query parsing in JavaScript

For pimp, I want to be able to search a specific column, say, artist, without needing multiple fields for searching. The ability to specify more advanced searches than simple keywords is quite useful. How do we leverage this on the client and turn a search query into a set of key-value pairs?

I must confess I was hesitant to put this kind of logic into Javascript instead of Python. Furthermore, it makes me feel a little uneasy using /foo/ in anything other than Perl. Nonetheless, doing this in Javascript was simple and it's still fast (as it should be).

The particular type of query I want to parse come in the following (hopefully intuitive) formats:

  • foo
  • artist:Eminem
  • album:"Across a Wire"
  • artist:"Counting Crows" album:august
The following code does this for me. The parse_query function will return a dictionary of query terms. Values are lists.

Here's an example:

Query
rain baltimore artist:"Counting Crows" album:august
Results of parse_query
    { "artist": ["Counting Crows"],
      "album": ["august"],
      "any": ["rain", "baltimore"],
    }
  
I take the dictionary returned and pass it to jQuery's $.post function to execute an AJAX (I hate that term, it's such a misnomer these days) request. Here's the code:
query_re = /(?:"([^"]+)")|(?:\b((?:[a-z:A-Z_-]|(?:"([^"]+)"))+))/gi,

function parse_query(string) {
  dict = {}
  while (m = query_re.exec(string)) {
    val = (m[1] || m[2]).split(":",2)
    if (val[1]) { key = val[0]; val = val[1]; }
    else { key = "any"; val = val[0]; }

    val = val.replace(/"/g,"");

    dict[key] = dict[key] || [];
    // the following should be .append(val) but 
    // I don't think javascript lists have them
    dict[key][dict[key].length] = val;
  }

  return dict;
}

Comments: 0 (view comments)
Tags: , , ,
Permalink: /geekery/search-query-parsing-in-javascript
posted at: 04:46

Search this site

Navigation

Metadata

Home About Resume My Code

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< January 2008 >
SuMoTuWeThFrSa
   1 2 3 4 5
6 7 8 9101112
13141516171819
20212223242526
2728293031  

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati