Search this site


[prev]  Page 3 of 4  [next]

Metadata

Articles

Projects

Presentations

perl makes addition fun!

I like calculators...

perl -lne 'print $a+=$_'

happy perl obfuscation #425234569807


perl -e '$foo = "hello there"; $foo =~ /(?{chop}){4}/; print "$foo\n"'

Nuff said.

You may have to look up what (?{...}) does in perldoc perlre

HTTP::Handle 0.2

After using HTTP::Handle heavily in my new link spider for work, I found a bunch of bugs and pitfalls that would snag it and make it eat iself. The lacking-features and bugs I found were added and fixed, respectively.

* $http->connect() now returns -1 on failure.
   Connections will fail on 3 occaisions: socket() fails, connect() fails, or the dns lookup on the hostname fails.
* After the socket is connected, $http->connect() can fail if no data is recevied in a given timeout period, which can be set. See perldoc HTTP::Handle.

It should be up on CPAN sometime later this morning, I just uploaded it.

Regular expression to match quoted and nonquoted strings

The method I was using to match quoted strings didn't work all that well and it wasn't entirely flexible - so I wrote a lamer version which works for everything I've tried -

# Make the underlined part match characters that can't be in a word.
# For HTML, set this to [^\s>]
my $regex = q!(?:"([^"]*)"|'([^']*)'|([^\s]+))!;

while (<>) {
	while (s/^$regex\s+//) {
		my $string = $1 || $2 || $3;
		print "Quoted string: $string
";
	}
}

It works, I tested it. Here's some sample output:

nightfall(~) > perl quoteregex.pl
testing
Quoted string: testing
foo bar baz
Quoted string: foo
Quoted string: bar
Quoted string: baz
"hello there" how are 'you doing'
Quoted string: hello there
Quoted string: how
Quoted string: are
Quoted string: you doing
'foo
Quoted string: 'foo
'foo bar baz
Quoted string: 'foo
Quoted string: bar
Quoted string: baz

For an example on how to get this to work with html, here's something that'll pull all the links from a webpage (anchor tags, and only the 'href' attribute):

#!/usr/bin/perl

use strict;
use HTTP::Handle;

my $hd = HTTP::Handle->new();
my $regex = q!(?:"([^"]*)"|'([^']*)'|([^\s>]+))!;

$hd->url($ARGV[0] || "http://www.google.com");
$hd->connect();
my $fd = $hd->fd();

undef $/;
my $source = <$fd>;

while ($source =~ s/<a?s+(?:[^>]+s+)*href=$regex[^>]*>//s) {
	my $link = $1 || $2 || $3;
	print "Link: $link\n";
}

Removing duplicates from arrays in Perl

After needing to do this in a project of mine, I went googling and found, as expected, a wide variety of solutions. Solutions ranged from using map, foreach, grep, etc.. All using things like a temporary hash to count instances and ensure uniqueness - but I remembered that hashes have unique keys and that hashes are often treated the same way as arrays in perl, so my solution is as follows:

my %foo;  #Temp var
my @a = qw (hello there hello how are you today there what now hello hello hello);

%foo = @a;
@a = keys(%foo);

I also have a one-liner version:

# Assumedly, @a is already defined and has stuff in it, perhaps...

@a = do { my %foo = @a; keys(%foo) };

Neat Perl DBI features

Something I'd never bothered doing is reading perldoc DBI. I was looking through it today looking for a function I forgot the name of and I ran across a function, selectall_hashref. This thing is *totally* sweet. It takes a SQL query and a key field as arguments. It then puts all the results into a hash with the "key field" as the hash key. I never knew about this until now. If you're still at a loss for what it saves you, here's what it shortens:

my $db = DBI->connect(...);

my $res = $db->preprae("SELECT * FROM foo");
$res->execute();

my $foo;
while (my $hr = $res->fetchrow_hashref()) {
	$foo->{$hr->{"id"}} = $hr;
}

And using selectall_hashref instead:

my $db = DBI->connect(...);

my $foo = $db->selectall_hashref("SELECT * FROM foo", "id");

tic 1.0 / Term::Shelly 0.2

Candice wanted a nonsucky, terminal-based aim client, and I was bored enough to write one. So after about 3 months of off-and-on development I've got enough work done on it to safely say it's useable. She and I both use it more or less regularly now. Including work done on my new module, Term::Shelly, I've decided to tag a release. tic's mostly bug free at the moment and supports a fair set of the features in AIM.

Features of tic:

  • Basic functionality: sending messages, receiving them, checking info, setting away, seeing who's online, etc.
  • Aliases can be created on demand to do whatever you want. You can cascade aliases aswell, meaning you can have one alias call another.
  • Logging is supported, logs go to ~/.tic/
  • Persistent settings can be put in ~/.ticrc. See below for more info on this.
  • Tab completion of all commands, aswell as completion of screennames for certain commands (like /msg)
  • A shortcut for replying to whoever messaged you most recently - typing ; at the beginning of the line will auto fill in "/msg [whoever]" for you.

You'll need 3 perl modules: Net::OSCAR, Term::ReadKey, and a module I wrote, Term::Shelly. The first two you can get from cpan. However, Term::Shelly hasn't been uploaded yet (I'm lazy) so you'll have to download it from me:
Term::Shelly

.ticrc syntax
The basic syntax is: key = value
Comments are prefixed with #, and whitespace between the key, =, and value. The following settings are available:

  • screenname = someusername
  • password = yourpassword
  • port = 8080
  • log = all|off
  • timestamp = on|off

Download tic over there on the right...

Term Abbreviation Handling and Ambiguity Resolution

So I went along looking into implenting good term abbreviation system for pimp's protocol. I've had previous experience doing this, but it was an utter pain to do. I found two pretty decent ways of doing this in my research.

What is term abbreviation? Let's say that for a program you enter commands on a line. You have a lot of commands, and you get tired of typing the full word for every single thing. Term abbreviation allows you to type a partial command and the system will understand what command you mean. This can be seen on VMS systems, some MUD servers, etc.

For example, let's say there's a command called "status". To execute this command you would normally have to type the full word. This becomes tedius after multiple recurrances. With term abbreviation, you can simply type "s" or "st", for instance, and the system will acknowledge that you really mean "status.

My original implementation went something like this (in perl):

# $string is some word that has been set beforehand.
if ($string =~ m/^s(t(a(t(u(s)?)?)?)?)?$/i) {
	# execute code for the status command
	# ...
} elsif ($msg =~ m/^h(e(l(p)?)?)?$/i) {
	# execute code for the help command
	# ...
} else {
	# unknown command, yadda yadda yadda
}
# ...

Obviously typing all that for a regular expression match allowing abbreviation is not only annoying to type but terribly difficult to change and maintain - you have to count parenthesis and question marks, etc. Ugh! Want more pain? Try using backreferences. Yeah, how about this example trapping extra arguments to the command:

my $string = "help fishing";
#...
} elsif ($string =~ m/^h(e(l(p)?)?)? (.*)$/i) {
	print "User requested help for $4\n";
}

If you aren't familiar with backreferences, $4 refers to the fourth group match in that regular expression - count the open parenthesis and you'll see that the (.*) is the fourth group. This is particularly nasty because for every command you want to grok arguments for (atleast in this particular fashion) you have to count how many groups you have.

So, that being a silly idea, there are atleast two other options we can use. One way is to essentially swap the two parameters in our regex comparison. Instead of seeing if $string is a shortened version version of "status", see if $string matches $status.Check this:

my $cmd = "status";
if ("status" =~ m/^$cmd/) {
	# $cmd is "status" or is a shortened version of it.
	print "Status!\n";
} elsif ("help" =~ m/$cmd/) {
	# $cmd is "help" or is a shortened version of it.
	print "Help!\n";
}

Ok, so, we're good now, right? No painful regular expressions, no painful maintanence when we want to change commands later. Neato. There is one small (major?) flaw in this design, however, it can be easily corrected. What if $cmd somehow contains characters special to regular exprsesions? Your match may fail entirely becuase of the way perl handles variables inside regular exprsesion patterns! Not to fear, there is an easy solution - perldoc perlre states that \Q will "quote (disable) pattern metacharacters till \E." So we make one small change to our expressions:

my $cmd = "status";
if ("status" =~ m/^\Q$cmd\E/) {
	# $cmd is "status" or is a shortened version of it.
	print "Status!\n";
} elsif ("help" =~ m/^\Q$cmd\E/) {
	# $cmd is "help" or is a shortened version of it.
	print "Help!\n";
}

All better. Now it doesn't matter what characters are in $cmd our expression won't fail due to improper syntax. It is extremely important to remember that.

What if you don't like lots of if-elsif-else statements? Ok, there's a solution for you. It involves using hot grep and eval action:

my @commands = qw(status help info);

# Let's say for this example the user tped "st" for "status"
my $input_command = "st";

my ($match) = grep { /^\Q$input_command\E/ } @commands;

eval "&{$match}";

sub status { print "status command called\n"; }
sub help { print "help command called\n"; }
sub info { print "info command called\n"; }

Much much shorter, and is good if you prefer it as such. This also helps split your code into simple functions instead of normal top-down code. One final problem exists: What if there is an ambiguity among two similar commands? That is, there might be two commands that, when shortened too much are the same. For example, there might be a "status" command and a "start" command. What now? Easy fix:

my @commands = qw(status help info);

# Let's say for this example the user typed "st" for "status"
my $input_command = "st";

my @matches = grep { /^\Q$input_command\E/ } @commands;


if (scalar(@matches) > 1) {
	print "The command '$input_command' is ambiguous.\n";
} elsif (scalar(@matches) == 0) {
	print "There is no such command or abbreviation '$input_command'\n";
} else {
	eval "&{$matches[0]}";
}


sub status { print "status command called\n"; }
sub help { print "help command calle\n"; }
sub info { print "info command called\n"; }

*UPDATE* A fellow perl monger by the name of John Resig was kind enough to submit some extra code that also implements term abbreviation. It's a slightly different alternative:

my $in = "help";

my $cmds = {
	status => sub { print "Status!\n"; },
	help => sub { print "Help!\n"; },
	helpme => sub { print "Help Me!\n"; }
};

my @m = grep {/^\Q$in\E/i} keys %{$cmds};

if ( @m > 1 && (!exists $cmds->{$in}) ) {
	print "Ambigious!\n";
} elsif ( @m == 0 ) {
	print "No Such!\n";
} else {
	&{$cmds->{$m[0]}};
}

Whew! Now we have a smart solution that's terribly easy to maintain.

Until next time... later :)

boredom in my SE class

I was curious to see how much slower sprintf was in perl than an equivalent print statement. Using Debug::Profile hotness, after a million calls it turns out that sprintf is insignificantly slower. Yay boredom.

time elapsed (wall):   47.2054
time running program:  34.0098  (72.05%)
time profiling (est.): 13.1956  (27.95%)
number of calls:       2000000

%Time    Sec.     #calls   sec/call  F  name
60.20   20.4722        0  20.472220  *  
21.04    7.1548  1000000   0.000007     main::sp
18.77    6.3828  1000000   0.000006     main::nosp

sub sp {
   return sprintf("(%s) %s - %s\n", $album, $artist, $title);
}

sub nosp {
   return "(" . $album . ") " . $artist . " - " . $title . "\n";
} 

Neat perl trick

I've never known how programs change what their command-lines are listed as in ps(1) output. I knew that $0 in perl would let you view what the program was called as, so I took the chance of trying to actually *set* this variable. Here's what happened:
whack(~) [530] !130! > perl -e 'sleep'&
[1] 89053
whack(~) [531] > ps | grep perl
89053  p3  SL     0:00.00 perl -e sleep
...
whack(~) [523] > perl -e '$0 = "Hello there!"; sleep' &
[1] 89119
whack(~) [524] > ps | grep perl
89119  p3  SL     0:00.00 Hello there! (perl)
Neat? I think so... You may not, whatever :)