Search this site


Metadata

Articles

Projects

Presentations

Grok + Lucene

I mentioned last night some ideas about an open source data analytics tool. I spent a few minutes today cleaning up the code I used to test grok and lucene.

I used the latest HEAD version of grok to turn Apache logs into JSON and wrote a Java program to read the JSON output into Lucene. The last step was to write a simple search tool to query the data in Lucene.

For a test case, I used a 10000-line apache access log. To populate, I just ran this:
% ./grok | java GrokJSONImport
Grok (per the config above) will output json objects for each match and GrokJSONImport will read each line and parse it as json, telling Lucene that each new log entry is a new document with fields matched by grok.

Let's search for all successful HTTP POSTs (well, the first 100 hits, since LogSearch.java only asks for 100 hits):

% java LogSearch '+response:200 +verb:post' timestamp verb request response
Found 5794 hits.
timestamp: 18/Jan/2009:04:01:00 -0500
verb: POST
request: /hackday08/randomtags.py
response: 200

timestamp: 18/Jan/2009:04:01:05 -0500
verb: POST
request: /hackday08/randomtags.py
response: 200

< remainder of output cut >
Most of the hits are related to 'randomtags.py' which is a CGI script used by my yahoo pipes hack, SnackUpon. Let's filter out all of those requests:
% java LogSearch '+response:200 +verb:post NOT request:/hackday08/randomtags.py' timestamp verb request response
Found 91 hits.
timestamp: 18/Jan/2009:09:12:04 -0500
verb: POST
request: /blog/geekery/217
response: 200

timestamp: 18/Jan/2009:09:16:02 -0500
verb: POST
request: /blog/static/about#comment_anchor
response: 200

< remainder of output cut >
What if I want to see some non-200 response code GETs? Turn the query into 'verb:get NOT response:200' and you're done.

Pretty cool, eh? :)


0 responses to 'Grok + Lucene'

Showing last 0 comments... (Click here to view all comments)


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional, if you want me to be able to email you back)
URL (also optional)
Comment: