Search this site


Metadata

Articles

Projects

Presentations

Term Abbreviation Handling and Ambiguity Resolution

So I went along looking into implenting good term abbreviation system for pimp's protocol. I've had previous experience doing this, but it was an utter pain to do. I found two pretty decent ways of doing this in my research.

What is term abbreviation? Let's say that for a program you enter commands on a line. You have a lot of commands, and you get tired of typing the full word for every single thing. Term abbreviation allows you to type a partial command and the system will understand what command you mean. This can be seen on VMS systems, some MUD servers, etc.

For example, let's say there's a command called "status". To execute this command you would normally have to type the full word. This becomes tedius after multiple recurrances. With term abbreviation, you can simply type "s" or "st", for instance, and the system will acknowledge that you really mean "status.

My original implementation went something like this (in perl):

# $string is some word that has been set beforehand.
if ($string =~ m/^s(t(a(t(u(s)?)?)?)?)?$/i) {
	# execute code for the status command
	# ...
} elsif ($msg =~ m/^h(e(l(p)?)?)?$/i) {
	# execute code for the help command
	# ...
} else {
	# unknown command, yadda yadda yadda
}
# ...

Obviously typing all that for a regular expression match allowing abbreviation is not only annoying to type but terribly difficult to change and maintain - you have to count parenthesis and question marks, etc. Ugh! Want more pain? Try using backreferences. Yeah, how about this example trapping extra arguments to the command:

my $string = "help fishing";
#...
} elsif ($string =~ m/^h(e(l(p)?)?)? (.*)$/i) {
	print "User requested help for $4\n";
}

If you aren't familiar with backreferences, $4 refers to the fourth group match in that regular expression - count the open parenthesis and you'll see that the (.*) is the fourth group. This is particularly nasty because for every command you want to grok arguments for (atleast in this particular fashion) you have to count how many groups you have.

So, that being a silly idea, there are atleast two other options we can use. One way is to essentially swap the two parameters in our regex comparison. Instead of seeing if $string is a shortened version version of "status", see if $string matches $status.Check this:

my $cmd = "status";
if ("status" =~ m/^$cmd/) {
	# $cmd is "status" or is a shortened version of it.
	print "Status!\n";
} elsif ("help" =~ m/$cmd/) {
	# $cmd is "help" or is a shortened version of it.
	print "Help!\n";
}

Ok, so, we're good now, right? No painful regular expressions, no painful maintanence when we want to change commands later. Neato. There is one small (major?) flaw in this design, however, it can be easily corrected. What if $cmd somehow contains characters special to regular exprsesions? Your match may fail entirely becuase of the way perl handles variables inside regular exprsesion patterns! Not to fear, there is an easy solution - perldoc perlre states that \Q will "quote (disable) pattern metacharacters till \E." So we make one small change to our expressions:

my $cmd = "status";
if ("status" =~ m/^\Q$cmd\E/) {
	# $cmd is "status" or is a shortened version of it.
	print "Status!\n";
} elsif ("help" =~ m/^\Q$cmd\E/) {
	# $cmd is "help" or is a shortened version of it.
	print "Help!\n";
}

All better. Now it doesn't matter what characters are in $cmd our expression won't fail due to improper syntax. It is extremely important to remember that.

What if you don't like lots of if-elsif-else statements? Ok, there's a solution for you. It involves using hot grep and eval action:

my @commands = qw(status help info);

# Let's say for this example the user tped "st" for "status"
my $input_command = "st";

my ($match) = grep { /^\Q$input_command\E/ } @commands;

eval "&{$match}";

sub status { print "status command called\n"; }
sub help { print "help command called\n"; }
sub info { print "info command called\n"; }

Much much shorter, and is good if you prefer it as such. This also helps split your code into simple functions instead of normal top-down code. One final problem exists: What if there is an ambiguity among two similar commands? That is, there might be two commands that, when shortened too much are the same. For example, there might be a "status" command and a "start" command. What now? Easy fix:

my @commands = qw(status help info);

# Let's say for this example the user typed "st" for "status"
my $input_command = "st";

my @matches = grep { /^\Q$input_command\E/ } @commands;


if (scalar(@matches) > 1) {
	print "The command '$input_command' is ambiguous.\n";
} elsif (scalar(@matches) == 0) {
	print "There is no such command or abbreviation '$input_command'\n";
} else {
	eval "&{$matches[0]}";
}


sub status { print "status command called\n"; }
sub help { print "help command calle\n"; }
sub info { print "info command called\n"; }

*UPDATE* A fellow perl monger by the name of John Resig was kind enough to submit some extra code that also implements term abbreviation. It's a slightly different alternative:

my $in = "help";

my $cmds = {
	status => sub { print "Status!\n"; },
	help => sub { print "Help!\n"; },
	helpme => sub { print "Help Me!\n"; }
};

my @m = grep {/^\Q$in\E/i} keys %{$cmds};

if ( @m > 1 && (!exists $cmds->{$in}) ) {
	print "Ambigious!\n";
} elsif ( @m == 0 ) {
	print "No Such!\n";
} else {
	&{$cmds->{$m[0]}};
}

Whew! Now we have a smart solution that's terribly easy to maintain.

Until next time... later :)