photo
Jordan Sissel
geek

Wed, 28 Feb 2007

Alexa data is confusing

According to alexa.com:
  • Traffic Rank for semicomplete.com: 559,376
  • Rank in Spain: 13,257
Google Analytics believes this to be different. While Analytics doesn't give me traffic rank, it does tell me that the majority of the viewers come from the US and some of Europe (Mostly UK, North East and West US). Not Spain. Analytics watches everyone who comes to semicomplete.com.

I wonder where Alexa is getting it's data from.

Update - Brock pointed out that since pain is smaller than the US, that might account for my extremely-high rank. I left out some data becuase I was being lazy in this post, here it is:

This says that 41% of the viewers come from Spain. Again, Google Analytics disagrees. My only guess at this point is that of those who visit my site, Alexa is able to track spaniards better? Maybe it's some toolbar thing that reports usage data. Brief looking around at alexa.com doesn't provide anything useful.

Googling shows this which has a link to Alexa's explanation. Yep. Toolbar.

Comments: 5 (view comments)
Tags: , ,
Permalink: /geekery/alexa-is-confusing
posted at: 16:23

Mon, 26 Feb 2007

grok pattern match predicates

I've added predicate tests to grok's pattern match system. These predicates allow you to specify an additional requirement on any matched patterns. Here's the grammar:
  '%' pattern_name [ ':' subname ] [ operator value ] '%'
The difference is that now you can put operator and values on the end of the pattern. The following are valid operators: < > <= >= == ~

== < > <= >=
Match equals, less than, etc. Should be obvious. One special note is that if both the match and predicate values are numbers, then the comparison is done using perl's numerical compare operators. Otherwise, string comparators are used (eq, lt, gt, etc).
~
Regular expression match.

Still confused? Let's run through some examples.

  1. Let's find out what's going on in our auth.log on any day from 20:00 to 20:09:
    % sudo cat /var/log/auth.log | ./grok -m '%TIME~/^20:0[0-9]/%'
    Sep 15 20:05:24 nightfall sshd[503]: Server listening on :: port 22.
    Sep 15 20:05:24 nightfall sshd[503]: Server listening on 0.0.0.0 port 22.
    Sep 15 20:07:31 nightfall login: login on ttyv0 as jls
    Nov 12 20:09:42 nightfall xscreensaver[647]: FAILED LOGIN 1 ON DISPLAY ":0.0", FOR "jls"
    Nov 26 20:07:18 nightfall sshd[494]: Server listening on :: port 22.
    Nov 26 20:07:18 nightfall sshd[494]: Server listening on 0.0.0.0 port 22.
      
  2. How about looking through 'netstat -s' output for big numbers? Yes, you can use awk for this particular example.
    % netstat -s | ./grok -m "%NUMBER>100000%"
            130632 total packets received
            130465 packets for this host
            114759 packets sent from this host
      
  3. Let's look in "all.log" (all syslog stuff goes here) for sshd lines with an IP starting with '83.'
    % ./grok -m "%SYSLOGBASE~/sshd/% .* %IP~/^83\./%" -r "%SYSLOGDATE% %IP%" < all.log
    Oct 17 09:54:37 83.170.72.199
    Oct 17 09:54:53 83.170.72.199
    Oct 17 09:56:02 83.170.72.199
    <snip some output >
    Apr 16 06:54:52 83.14.104.202
    Apr 16 06:54:53 83.14.104.202
    Apr 16 06:54:54 83.14.104.202
If you're interested in playing with this new feature, download grok-20070226.

This seems pretty powerful. Next feature I need to add is the ability to add predicates to patterns after they've been specified. Something like this would be sweet:

% ./grok -m "%APACHELOG%" -p "%NUMBER:RESPONSE==404%"
< some output showing you all apache log entries with response code 404 >
Something like that, which would let you modify the %NUMBER:RESPONSE% pattern to add a predicate requiring that it be 404.

Comments: 0 (view comments)
Tags:
Permalink: /geekery/grok-pattern-predicates
posted at: 06:40

Sat, 24 Feb 2007

grok 20070224 released.

It's been almost a year since the first release of grok. I've finally found some energy to put into the project and it's time for another release.

Download: grok-20070224.tar.gz

A quick summary of the changelist (which comes with the tarball):

  • Lots of doc updates. More examples in the manpage.
  • Lots of new builtin patterns
  • More new filters like strftime, ip2host, and uid2user.
  • Fancier syslog matching options
  • New flags -m and -r. See this post about this change
  • filelist, catlist, and filecmd thanks mostly to Canaan Silberberg.
  • More tests to make sure that it works. Find these in the 't' directory in the grok tarball.
Email me if the tests provided don't work.

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/grok-new-release-20070224
posted at: 04:16

Tue, 20 Feb 2007

grok + netstat

What hosts is this machine connected to:
% netstat -anfinet | perl grok -m "%IP:S%.*?%IP:D%" -r "%IP:D|ip2host%" | sort | uniq
fury.csh.rit.edu
mc-in-f104.google.com
mc-in-f147.google.com
scorn.csh.rit.edu
I have no idea the mc-in-f104 stuff is, but firefox is open to 'www.google.com' right now. Let's find out what 'www.google.com' points at:
% host www.google.com | perl grok -m "%IP%" -r "%IP|ip2host%"
mc-in-f147.google.com
mc-in-f99.google.com
mc-in-f104.google.com
I keep finding more uses for grok now that you can use it on the commandline easily.

Comments: 0 (view comments)
Tags:
Permalink: /geekery/grok-and-netstat
posted at: 04:13

Mon, 19 Feb 2007

grok for apache log analysis

I recently made a small change to my rss and atom feeds. I add a tracker image in the content. It looks like this:
<img src="/images/spacer.gif?path_of_a_blog_entry">
Any time someone views a post in an RSS feed, that image is loaded and the client browser happily reports the referrer url and I get to track you! Wee.

This is in an effort to find out how many people actually read my blog. Now that I can track viewship of the rss/atom feeds, how do I go about analyzing it? grok to the rescue:

% grep 'spacer.gif\?' accesslog \
   | perl grok -m '%APACHELOG%' -r '%IP% %QUOTEDSTRING:REFERRER%' \
   | sort | uniq -c | sort -n
<IP addresses blotted out, only a few entries shown>
  1 XX.XX.XX.XX "http://www.google.com/reader/view/"
  9 XX.XX.XX.XX "http://whack.livejournal.com/friends"
  10 XX.XX.XXX.XXX "http://www.bloglines.com/myblogs_display?sub=44737984&site=6302113"
  10 XX.XXX.XXX.XX "http://www.semicomplete.com/?flav=rss20"
  27 XX.XXX.XXX.XX "http://whack.livejournal.com/friends"
Each line represents a unique viewer, and tells me what the reader was using to view the feed.

Yay grok.

Comments: 0 (view comments)
Tags: , , ,
Permalink: /geekery/grok-blog-analysis
posted at: 22:53

Sun, 18 Feb 2007

grok - now with more steroids

I added a new filter to grok: strftime. Same format strings as strftime(3) provides but you need to use & instead of %. Ie; strftime("&D"). This is useful when combined with parsedate. I also added a new default pattern, APACHELOG, which will match a standard apache log entry.

Along with that little addition comes another way cooler addition which lets you use grok entirely from the command line in a way resembling grep on crack. For this we needed 2 new command line flags: new flags -m and -r.

  • -m : specify a match string
  • -r : specify a reaction string. Defaults to "%=LINE%" if omitted.

The reaction string specifies what is printed on a match. There is no support (yet?) for specifying reactions other than printing out data. If you want a command to be executed, you could use a clever combination of the shdq filter and have grok output shell commands. More on that later on.

The implementation of this is somewhat klunky, but it works. Under the hood, here's what happens:

% grok -m FOO -r BAR
grok takes this and generates the following config in memory:
  exec "cat" {
    type "all" {
      match = "FOO";
      reaction = { print meta2string("BAR", $v); };
    };
  };
The data source being read from is output from cat, which is just a lame hack to trick grok into reading a file from stdin. This is really useful. Let's try a few examples:

Grep a file for anything looking like an IP:

We have a log file with IPs in it. Writing a regex to grab any line with an IP on it is annoying. Let's use grok:
% perl grok -m "%IP%" < /var/log/messages | head -5
Feb  7 19:50:52 kenya dhcpd: Forward map from D962WZ71.home to 192.168.10.189 FAILED: Has an A record but no DHCID, not mine.
Feb  7 19:50:52 kenya dhcpd: Forward map from D962WZ71.home to 192.168.10.189 FAILED: Has an A record but no DHCID, not mine.
Feb  7 22:17:16 kenya named[17044]: stopping command channel on 127.0.0.1#953
Feb  7 22:17:18 kenya named[16239]: command channel listening on 127.0.0.1#953
Feb  9 05:11:17 kenya sshd[22002]: error: PAM: authentication error for root from 211.147.17.110
At this point, grok is behaving much like grep, but you get all of the easy matching power of grok.

Syslog messages with IPs + extra text processing

How about if we want any syslog message with an IP in it, and we want to know what date and program logged it?
% perl grok -m "%SYSLOGBASE% .* %IP%" -r "%SYSLOGDATE|parsedate|strftime('&D')% %PROG% %IP%\n" < /var/log/messages | head -5
02/07/07 dhcpd 192.168.10.189
02/07/07 dhcpd 192.168.10.189
02/07/07 named 127.0.0.1
02/07/07 named 127.0.0.1
02/09/07 sshd 211.147.17.110

Process apache logs

What about the new APACHELOG pattern? Here's a sample usage of it:
% tail -5 access | perl grok -m "%APACHELOG%" -r "%HTTPDATE|parsedate% %QUOTEDSTRING:URL|httpfilter%\n"
1171799519 /blog/geekery/grok-like-grep.html?source=rss20
1171799581 /projects/solaudio
1171799624 /projects/solaudio/
1171799651 /~psionic/seminars/vi/viseminar.html
1171799652 /seminars/vi/viseminar.html

Break a file into parts grouped by IP

What if you want to find out what IP causes the most log chatter? Use the shdq filter and have grok output shell commands which you then pipe to /bin/sh.
% cat /var/log/messages | perl grok -m '%IP%' -r 'echo "%=LINE|shdq%" >> /tmp/log.%IP%'  | sh
% ls /tmp/log.*
/tmp/log.127.0.0.1              /tmp/log.211.147.17.110
/tmp/log.192.168.0.254          /tmp/log.71.70.243.218
% wc -l /tmp/log.*
       4 /tmp/log.127.0.0.1
       1 /tmp/log.192.168.0.254
      70 /tmp/log.211.147.17.110
       1 /tmp/log.71.70.243.218

Latest version (potentially unstable, but the above examples work):

Download and enjoy: grok-20070218.tar.gz

Comments: 0 (view comments)
Tags:
Permalink: /geekery/grok-like-grep
posted at: 08:04

Wed, 14 Feb 2007

Vertical tabs in Firefox 2

Update: Vertigo has been released for Firefox 2! Yay :)

The 'Vertigo' extension doesn't work in Firefox 2. Some googling finds a few solutions, all of which suck. That said, I think I'm going to dive back into playing with firefox and make an extension.

So far I've managed to get vertical tabs with a scrollbar that pops up when there are more-than-displayable tabs open. However, much of tonight left me extremely frustrated.

Development with Firefox seems to be exceedingly dependent on trial-and-error. Save whatever files, restart firefox. Repeat. Repeat. Repeat. Firefox is not lightning quick to startup, and I'm not sure how to edit extensions that are currently running without a restart. Maybe there's a debugger I don't know about. Mostly I'd just like to explore the DOM while it's running (Firefox's XUL DOM, not the current web page).

All I wanted to add (tonight) was the ability to choose what side of the browser the tab bar went on.

The following CSS will move the bar to the right (with my extension):

#appcontent tabbox {
  -moz-box-direction: reverse;
}
Also doing <tabbox dir="reverse" in the XUL works too. I need to set this in javascript.

This means tabbox.style.MozBoxDirection = "reverse" should work, right? Here's everything I tried:

var tabbox = document.getElementsByTagName("tabbox")[0];

// Doesn't work (trying either 'reverse' or 'rtl'):
tabbox.style.MozBoxDirection = "reverse";
tabbox.style.direction = "reverse";
tabbox.dir = "reverse";
tabbox.direction = "reverse";

//Try to tell the vbox  (tab list) to order after/before the browser pane:
tabbox.childNodes[0].ordinal = 0;
tabbox.childNodes[0].ordinal = 2;
I'm at a total loss. My lack of familiarity with XUL is hurting me here. What's confusing, is the following code outputs "ltr" (left to right), meaning tabbox.style.direction = "rtl" should work:
  var x = window.getComputedStyle(tabbox, ""):
  alert(x.getPropertyValue("direction")):
Googling for 'tabbox dir' and other variants doesn't show much promise. Wrapping the contents of the tabbox in an hbox and attempting to tweak the direction of the hbox fails, too.

The following code produces something interesting:

alert(tabbox.childNodes[0].nodeName + " / " + tabbox.childNodes[1].nodeName);
The output is "tabs / tabpanel". It should be "vbox / splitter" or something close to that.

Further investigation lands me at gBrowser.mTabBox which has the correct children (has the full xul dom within the real tabbox. where tabbox.childNodes[0] should be a vbox, and it is only when I access mTabBox, not through the tag lookup.

gBrowser.mTabBox.dir = "reverse";
And voila, the tab bar is on the right.

I'm not sure why the following statements yield different values:

 document.getElementsByTagName("tabbox")[0] != gBrowser.mTabBox
Very strange... These should point to the same objects, and while they both are 'tabbox' elements, their children are quite different (the former is an element-trimmed version containing only tabs and tabpanel).

Anybody? ;)

Comments: 4 (view comments)
Tags: , ,
Permalink: /geekery/firefox-2-vertical-tabs-extension-stuff
posted at: 04:13

Wed, 07 Feb 2007

Mini-FreeBSD script

I wrote a script a while ago to build a very tiny freebsd world. It's extremely fast and only builds a freebsd image in approximately 10 megs of space. It lets you quickly create new jail enviroments or system images for small embedded platforms.

If you look at the script itself, you'll get an idea of what it installs. I used a variant of this script to build the system I run on my Soekris net4501 which runs FreeBSD and is under 20 megs.

There are lots of "make a small freebsd system" scripts, but most of the ones I've found rely heavily on 'buildworld' and what not. This takes a live system and copies the binaries you need, then uses ldd(1) to track down required libraries.

view minibsd.sh

Example usage:

kenya(~/t) % rm -rf ./soekris/
kenya(~/t) % time sudo ./minibsd.sh
sudo ./minibsd.sh   0.16s user 0.65s system 61% cpu 1.326 total
kenya(~/t) % sudo chroot ./soekris /bin/sh
# pwd
/
# exit
Simple jail config (rc.conf):
jail_enable="YES"
jail_list="test"
jail_test_rootdir="/home/jls/t/soekris"
jail_test_hostname="test"
jail_test_ip="10.1.1.1"
jail_test_interface="tl1"
Put something simple in this jail's rc.conf (/home/jls/t/soekris/etc/rc.conf):
sshd_enable="YES"
sendmail_enable="NONE"
Let's test the jail now:
kenya(~/t) % sudo /etc/rc.d/jail start
Configuring jails:.
Starting jails: 
At this point, it's probably hung (assuming you enabled sshd). If you hit CTRL+T you'll see what command has the foreground and what it's doing.* This is because it's prompting you (output is directed to JAILROOT/var/log/console.log) for entropy for the ssh-keygen. Smash a few keys then hit enter. It'll finish eventually.
kenya(~/t) % sockstat -4 | grep 10.1.1.1:22 
root     sshd       2258  3  tcp4   10.1.1.1:22           *:*
Our sshd is running happily inside that jail we made. This whole process took about 5 minutes.

* FreeBSD's CTRL+T terminal handler feature has to be the best thing ever invented. I wish Linux had something like this. Here's what hitting CTRL+T when running cat looks like:

kenya(~) % cat
load: 0.45  cmd: cat 2324 [ttyin] 0.00u 0.00s 0% 600k
load: 0.42  cmd: cat 2324 [ttyin] 0.00u 0.00s 0% 600k
It clearly shows you the command name, the pid, and the syscall-type-thing it's doing. Clearly cat is waiting for input from the tty. <3 FreeBSD.

Comments: 10 (view comments)
Tags: , , ,
Permalink: /geekery/mini-freebsd-script
posted at: 03:27

Tue, 06 Feb 2007

Mysql slave server-id selection

For a current project, I need the ability to dynamically grow and shrink a pool of mysql slaves. In order for replication to work properly, every slave must have a unique server id. When you want to grow another slave, how do you choose the server id?

Two slaves with the same server id will replicate successfully, but when they reach the end of the master's binary log, something freaks out and forces them to disconnect. This causes both slaves to reconnect, sync (no data needed), and have the connection die off quickly again. The result of this is rapid connection/disconnection by both slaves driving the load to 1+ on both slaves, and to around .3 on the master even in a completely idle system. This is bad. Therefore, server id collisions are bad.

A simple approach might be to pick a random number. However, depending on your range, collisions may still occur. If there's even a slight chance of collision, you have to detect that collision and try a new number. Collision detection is expensive and can be done one of a few ways:

  • query all slaves asking "show global variable like 'server-id'" and comparing it against the chosen one. This has O(n) runtime, and doesn't scale.
  • Set the server id to whatever you picked at random, have a heuristic tool that can detect the behavior that happens when two server ids collide. This is obviously a horrible idea.
Random choice doesn't seem to be very good. Scanning all slaves and picking an id that isn't in the set of known ids is also bad, as mentioned above. So what now?

We need a number that will never repeat. You might think about using a small table in the master with an auto_increment column and always get a new id that way, but why? Time is always increasing. Bonus that mysql's server-id is an unsigned 32bit value, so unix epoch values will be fine until the distant future.

A trivial script can generate your my.cnf whenever you bring up a new slave with the current time as a server id and you're pretty much guaranteed never to have a collision unless you grow two slaves up at the same second (how likely is that?).

Simple mysql config:

# my.cnf.in
server-id=SERVERID
Simple script to generate a config with a proper serverid:
#!/bin/sh

m4 -DSERVERID=`date +%s` my.cnf.in > /etc/my.cnf
Make this part of your "add a new mysql slave" setup and you'll a scalable server-id selection system.

Alternatively, since mysql server-id values are, again, 32 bit, you can simply use the IP address of the machine itself. Something like this:

#!/usr/bin/perl
# Turn an IP into an integer for use with mysql server IDs (or whatever)

$exp = 3;
map { $x += $_ * (2 ** (8 * $exp--)) } split(/\./, $ARGV[1]);
print $x
I named it ip2int.pl. You can use Socket's inet_aton and unpack to achieve the same result here.
./ip2int.pl 129.21.60.5
2165652485
Since IPs are in theory unique, you can use use the IP of the mysql server for its own server ID.

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/simple-mysql-slave-id-scaling
posted at: 02:58

Sat, 03 Feb 2007

Paperback adventures

After dinner tonight, Wendy and I took a side trip to Borders. I spent a little while looking online for a decent "things you should know as a consultant/independent contractor" book, but none of the ones that looked promising were in the store.

I gave up and wandered around some more and landed in the computer section. Turns out tech books haven't gotten any better over the years. There are entire shelves dedicated to things I want to know the least about: Excel, Vista, Myspace, AJAX. I haven't bought a tech reference book in ages for the simple reason that they all suck. Sure, they've got useful information, but my questions are answered much more quickly by a few quick Google searches.

Something in me said "get a book" - which is strange, becuase I usually can't find the time to read. Excluding one book, I haven't read anything in full since high school.

That one book was Silence on the Wire, a great book on passive reconnaisance. It wasn't a novel, it was a technical book. Rather, it was a technical narrative. It read like a novel, but the content was similar to a reference manual. It was well written, and enjoyable to read. If you're a technical guy, and have some interest in security, then check the book out. Totally worth the read, and bonus that I learned a few things.

Anyway, back at Borders. I was out of place here among the stacks of Excel, Myspace, and AJAX for Dummies. Blah. On the top shelf was "Code 2.0" the 2nd edition of Lawrence Lessig's original. I read the preface, and it looked interesting. I'll post a review when (if?) I finish it.

Back at my original thought - am I the only one who fails to find real value in most tech books? I had bad experiences with "learn this programming language" books. I just can't get into them. Most of them fail for various reasons. Most programming books have the first 6 chapters filled with the same data as all the rest:

Chapters 1-3: Computers are not scary. Chapters 4-5: You can make computers do things! Chapter 6: This is a variable. This is an if statement. Chapter 7: Oh, you're still here? Hmm, Guess I should start talking about how to use Python. Chapter 8: Hello World! Chapter 9: Thanks for your $50.

Speaking of python. I highly recommend diveintopython.org if you want to learn it and are already familiar with programming. Buy the book or read it online - choice is good.

I've found the most value from pocket-type references. The same reason short papers are often more well written and more informative than longer ones. You've got to cut everything that isn't absolutely necessary. I wish more books did this. Who wants a 1200 page book on Microsoft Word anyway?

Comments: 3 (view comments)
Tags: ,
Permalink: /geekery/new-book
posted at: 04:46

Search this site

Navigation

Metadata

Home About Resume My Code

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< February 2007 >
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
25262728   

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati