Search this site


[prev]  Page 3 of 4  [next]

Metadata

Articles

Projects

Presentations

Pyblosxom comment antispam plugin

Ever since I added comments to this site, I've started getting comment spam. To combat this, I hacked together a comment management system using jquery and python. It lets me search comments and delete them via web interface.

I'm bored of deleting comments by hand. So, I wrote a little antispam plugin. This plugin creates a token that expires after a given period of time. This token is used as a hidden item in the comment form. If this token is expired when the form is submitted, the comment is rejected.

Spam seems to come entirely from solo-connection POST requests. This means that the bots don't bother viewing the page first. In theory, the bots will be using a cached idea of the form, which will be expired. We'll see how well this works.

Right now it just uses a timestamp. If that fails, I'll add other tokens such as source IP, etc. Perhaps cookies too? This should be simple to filter out, becuase the spam bots don't act anything like humans with regards to browsing behavior.

I have enabled the plugin on this site. I'll post the source when I see it actually working correctly.

Waiting for spam bots to come by is boring :(

Python's seeming lack of good parser

I've been searching for a decent recursive descent parser for Python. Too bad I haven't found one :(

None are truely standalone, though many claim to be generators. Either generate code or give me a nice parser library, not half-assedly in the middle! Urgh!

ANTLR depends on import antlr and Java. PLY does similar. Others simply suck. Who wants to lug around piles of libraries and modules? I don't. PLY may be an option, but it may be some time before I can make a decent grammar with it. Perhaps in a day or two when I have more time.

Granted, I'm probably just frustrated from many hours of trying parsers without success. It's not that there aren't any parsers that don't work. It's that there aren't any parsers that are as easy to use as perl's Parse::RecDescent.

All I want is to parse an extremely simple config file of my own design. I may not even need recursive descent, seeing as how I only go 1 level deep. Though, I would prefer a token parser that suited my needs (cfgparser is too limited, shlex is broken), I haven't been able to find one.

I was able to get a config file parser with older grok using Parse::RecDescent in only a few hours, and even after 10 minutes I was using it successfully. Have parsers fallen to the way-side with the advent of XML as a cure-all?

This pisses me off. I should be able to say, "here's the grammar for my data" and be happy. I really wanted to get the config parser done in py grok tonight. I'm giving serious consideration to adding multiline and statefulness support to grok, just so I can parse a damned config file. That is, use grok to read it's own config file so that we can grok whatever data the config file says.

If you're reading, and you have suggestions for python text parsing modules that do not suck, please let me know.

Python's dict objects can be merged

Much to my surprise, python dict objects can be merged. I've needed this a few times but never knew about it mostly due to 'pydoc dict' not mentioning 'merge' anywhere.
>>> a = { 1: 2, 3: 4 }
>>> b = { 1: 100, 5: 6 }
>>> b.update(a)
>>> print b
{1: 2, 3: 4, 5: 6}
dict.update() takes any number of arguments, all of which must be dict objects. This lets you merge serveral dicts into one. Conflict resolution, it seems, is "last one in wins."

Grok to be rewritten in python

I spent a few hours tonight working on new features for grok and kept running into problems keeping track of data structures in my head. Grok currently makes heavy use of hash-of-hash-of-hash-of-ha...-type datastructures in perl. Remembering context is annoying and slows development.

I decided that grok could use some serious refactoring. So much refactoring, that I could probably get away with rewriting it faster than redesigning portions of it. Since I need to know Python better, and I am more familiar with OO in Python than I am in Perl, I figure I should rewrite grok in Python. Python already has one critical feature that I need in grok, named captures. The hack for this in perl is unwieldy and potentially unmaintainable if future perl versions change the format or deprecate it. It is listed in 'perldoc perlre' as experimental.

At any rate, I made a little prototype that tries to be very OO. My experience with Good(tm) object oriented programming is still limited. The CS curriculum at RIT sucked for teaching proper OO, too many professors taught wildly different styles or were unclear about what Good(tm) OOP should look like.

Therefore, rewriting grok is a good opportunity to explore test-driven development and maintainable object-orientation. Oh, and synergism too. *shifty eyes*

I've got a bit of code up and running already, and writing "for tests" seems to be a very cool way to think about programming. If I force myself to write easily-testable code, then writing tests is easy. Furthermore, initial experience seems to show that adding new features is much easier when all of the code is compartmentalized.

If nothing else, I wrote a somewhat cool debug method that accesses the call stack for function, class, module, etc. Check out the 'debuglib.py' file. The output looks something like this:

grok/groklib.py:52] RegexGenerator.saveRegex: happy message here
The file, line number, class and function name are all discovered magically in debug(). I like this.

If you get bored, you can look at the original stuff here: scripts/grok-py-test

Pyblosxom single-entry page title plugin

The page titles pyblosxom provides are usually great. However, when there is only one entry displayed, I feel it would be better to rely on that entry's title.

I wrote a very short plugin to do just that. Turns out the plugin api for pyblosxom is quite easy to understand, and this hack was only about 10 lines.

pagetitle.py adds a new variable which will contain the standard page title, unless there is only one entry in view. If there is only one entry in view, the page title is augmented with the story title aswell. This makes search engine results and browsers happier, as they can recognize what your page is about by the title. User experience good, also good for search engines.

The new variable you want to use is: $blog_title_or_entry_title

If you want to get a better idea of what this plugin does, you can click the permalink below to view only this entry. The page title (in the url bar) should now reflect this entry's title.

download pagetitle.py

Wrap method calls in python

Function wrapping is quite useful, especially when you need to make code threadsafe by wrapping with a mutex locker or adding debug entry/exit traces. We can easily wrap methods in python using lambda.

A standalone module for wrapping can be found here: wrap.py If you don't understand what the * and ** stuff means, that's fine. I'll post about those shortly.

A fun, crappy example can be found here: wrapexample.py

That example shows how to wrap a simple method (X.Foo) with pre- and post-execution function calls. Notice how we can access the parameters passed to the original function (Foo) in both the pre and post functions. That's all good and pretty, but how about a better example?

A better example would be to wrap a function call with a mutex locker.

Let's take an example class happyfoo. A sample script that uses happyfoo can be found here: main.py. However, time passes and now I require the happyfoo.makenoise method to be locked while we are inside it. If you look at the code, it doesn't lock and is not threadsafe (for our purposes).

In an ideal situation, you might add locking to the 'happyfoo.py' module itself. What if you can't do that (no access) or don't have time to hack through the code? There's an easier way.

Python lets you modify classes at runtime. The new locking code can be found here: main-locking.py

The coolest part about this is I do not have to modify the 'happyfoo.py' module. Perhaps this is a dangerous feature, but I think it's neat. Anyway, the bulk of the new code should be self explanatory, with the possible exception of this:

happyfoo.makenoise = wrap.wrapmethod(happyfoo.makenoise, do_lock, do_unlock)
This is where I override the 'happyfoo.makenoise' method with a generated one that calls the original 'happyfoo.makenoise' function wrapped in 'do_lock' and 'do_unlock' functions. If you run the script, you'll see it locking aswell as the threads waiting for the lock.

If you want to download all of the code from the post, try this tarball: python-method-wrapper.tar.gz

The wrap module needs a lot of work, potentially. It would be nice to be able to wrap and also pass other arguments to both pre and post functions. I've got a hack that adds a reference to the function being called to the keyword (kwds) list, which lets you figure out which function is actually being called. Useful if you use the same pre/post functions to wrap more than one function.

New event recording database prototype

I finally managed to find time today to work on my events database project. In the processes of doing so, I found a few bugs in grok that needed to get fixed. Some of my regular expressions were being a bit greedy, so certain pattern expansion was breaking.

To summarize the event recording system, it is a webserver listening for event publish requests. It accepts the "when" "where" and "what" of an event, and stores it in a database.

To have my logs pushed to the database, I'll leverage the awesome power of Grok. Before I do that, I gathered all of the auth.log files and archives and compiled them into their respective files.

The grok.conf for this particular maneuver:

exec "cat ./logs/nightfall.auth.log ./logs/sparks.auth.log ./logs/whitefox.auth.log" {
   type "all syslog" {
      match_syslog = 1;
      reaction = 'fetch -qo - "http://localhost:8080/?when=%SYSLOGDATE|parsedate%&where=%HOST%/%PROG|urlescape|shdq%&what=%DATA:GLOB|urlescape|shdq%"';
   };
};
This is farily simple. I added a new standard filter, 'urlescape' to grok becuase I needed it. it will url escape a data piece. Hurray!

Run grok, and it sends event notifications to the webserver for every syslog-matching line. Using FreeBSD's command-line web client, fetch.

sqlite> select count(*) from events;
8085
Now, let's look for something meaningful. I want to know what happened on all sshd services between 1am and 3am this morning (Today, May 3rd):
nightfall(~/projects/eventdb) % date -j 05030100 +%s
1146632400
nightfall(~/projects/eventdb) % date -j 05030400 +%s
1146643200
Now I know the Unix epoch times for May 3rd at 1am and 4am.
sqlite> select count(*) from events where time >= 1146632400 
   ...> and time <= 1146643200 and location like "%/sshd" 
   ...> and data like "Invalid user%";
2465
This query is instant. Much faster than doing 'grep -c' on N log files across M machines. I don't care how good your grep-fu is, you aren't going to be faster.This speed feature is only the beginning. Think broader terms. Nearly instantly zoom to any point in time to view "events" on a system or set of systems. Filter out particular events by keyword or pattern. Look for the last time a service was restarted. I could go on, but you probably get the idea. It's grep, but faster, and with more features.

As far as the protocol and implementation goes, I'm not sure how well this web-based concept is going to prevail. At this point, I am not interested in protocol or database efficiency. The prototype implementation is good enough. From what I've read about Splunk in the past months in the form of advertisements and such, it seems I already have the main feature Splunk has: searching logs easily. Perhaps I should incorporate and sell my own, better-than-Splunk, product? ;)

Bear in mind that I have no idea what Splunk actually does beyond what I've gleaned from advertisements for the product. I'm sure it's atleast somewhat useful, or no one would invest.

Certainly, a pipelined HTTP client could perform this much faster than doing 10000 individual http requests. A step further would be having the web server accept any number of events per page request. The big test is going to see how well HTTP scales, but that can be played with later.

At this point, we have come fairly close to the general idea of this project: Allowing you to zoom to particular locations in time and view system events.

The server code for doing this was very easy. I chose Python and started playing with CherryPy (a webserver framework). I had a working event reciever server in about 30 minutes. 29 minutes of that time was spent writing a threadsafe database class to front for pysqlite. The CherryPy bits only amount to about 10 lines of code, out of 90ish.

The code to do the server can be found here: /scripts/cherrycollector.py

Python is getting on my nerves

Add lacking dynamic assignment ability to my "I wish Python had Foo" list.

Python does not appear to have dynamically assignable arrays. Where are we, C? Assembly? When I assign past the end of the array, I mean resize the god damned array. Thanks.

nightfall(~) % python -c "foo = []; foo[3] = 234"
Traceback (most recent call last):
  File "<string>", line 1, in ?
  IndexError: list assignment index out of range
This is completely unacceptable. Sure, I can use list comprehensions to make an N element array that's empty:
foo = [None for x in range(100)]
foo[44] = "Hi"
That only gets me an array with 100 empty elements. Uh.. Not what I want. If I did this on an array with data in it I didn't want to lose, I'd lose all the data.

Sigh...

Shoutcast stream 'lame' proxy

Many of my mp3s are of such a high bitrate that they saturate my crappy 30k/s DSL connection. To solve that problem, I wrote a proxy that connects to the real shoutcast server and essentially pipes the output through lame before sending it to you. Doing this, I was able to easily down encode any mp3 output to something more reasonable for streaming, such as 64kbit.

If you want to take a look at it, it's only 38 lines of python.

lameproxy.py

This is probably going to become a part of Pimp itself. Instead of accessing the normal '/stream/happystream' you could do '/stream/happy?bitrate=128' and it will lame-it up for you. I've *always* wanted this feature in Pimp since version 2 (version 1 wasn't networked).

On a funnier note, it seems like the other media guys are catching on to "networked is good" - XMMS2 is being written from scratch so I hear. It's going to sport a client/server model. So far there's MPD2, XMMS2, and Gstreamer that are well known. Whatever, as far as I can tell they all have the library and player in the same place. Pimp abstracts it one more level: a control client, a server, and a media client. In this case, the control client is Firefox and media client is your favorite mp3 player. Those other projects will eventually catch up to me I suppose ;)

ajax, json, and python, Oh my!

Whew! I don't think I've ever spent more hours on dead-end ideas on a project in a very long time. Most of the ideas I tried failed miserably due to either problems with design or problems with the underlying application. For instance, I found I couldn't return HTML from an XMLHTTPRequest call and append it to my document properly, ugh!

I've been doing a lot of brainstorming on how to spiff up the interface Pimp 4.0 is going to sport. I wanted to have a useful interface from the web that also looked cool in the process - something many interface programmers desire to achieve but almost always fall short.

There are a number of very good JavaScript library projects such as Prototype, Scriptaculous, and others, so I figured I'd try them out and see what I could do. Turns out, scriptaculous would be the most useful, but it doesn't work under XHTML with content-type application/xhtml+xml.

Having hit a number of brick walls, I decided to scrap using other people's code and write my own. I always learned more that way anyway.

The first step was to write a few small visual effects. The only two I've needed so far have been fading in an out. It makes use of Accelimation, a movement/acceleration library by Aaron Boodman over at youngpup.net.

After that, I wanted to reengineer the javascript bits for Pimp. So, I did that. Instead of lots of loosly-glued-together functions, I have one javascript "object" thing called 'Pimp' that has a number of functions. I was going to go all Object Oriented and such, but doing OO in JavaScript with timers and such is a pain in the ass. I gave up OO and went with a simpler approach, a static-class-like objecty thing ... JavaScript is an interesting beast.

I also moved to using JSON for server-to-client communication. Clients still send data using XMLRPC, but the responses are in JSON becuase there's no added processing time for the client to handle it. Thankfully, it was easy to replace the XMLRPC stuffs with JSON where necessary. I only had to update one function on pimp's python stuffs and add an 'eval' line to the javascript.

Putting all of this together, Pimp's "Stream list" now updates smoothly in realtime: New streams fade into the list and song changes are faded between on the screen. I think it looks very cool, and doesn't detract from the usefulness of the page. Another addition included fading between the "stream list" view and the viewing an individual stream. The stream list will fade out while the individual stream data loads (efficient waste of time, no?) followed by the individual stream's page fading in.

If you want to see what I've done so far, let me know. It may not work when you ask due to active development, though! Anyhoo. If you're intrested, I'll be happy to show you.

My JavaScript brain-cells were rusty yesterday. I suppose they aren't anymore ;)