According to alexa.com:
- Traffic Rank for semicomplete.com: 559,376
- Rank in Spain: 13,257
Google Analytics believes this to be different. While Analytics doesn't give me
traffic rank, it does tell me that the majority of the viewers come from the US
and some of Europe (Mostly UK, North East and West US). Not Spain. Analytics
watches everyone who comes to semicomplete.com.
I wonder where Alexa is getting it's data from.
Update - Brock pointed out that since pain is smaller than the US, that might account for my extremely-high rank. I left out some data becuase I was being lazy in this post, here it is:
This says that 41% of the viewers come from Spain. Again, Google Analytics
disagrees. My only guess at this point is that of those who visit my site,
Alexa is able to track spaniards better? Maybe it's some toolbar thing that
reports usage data. Brief looking around at alexa.com doesn't provide anything
useful.
Googling shows
this
which has a link to
Alexa's
explanation. Yep. Toolbar.
Comments: 5 (view comments)
Tags: alexa, data mining, traffic
Permalink: /geekery/alexa-is-confusing
posted at: 16:23
I've added predicate tests to grok's pattern match system. These predicates
allow you to specify an additional requirement on any matched patterns. Here's
the grammar:
'%' pattern_name [ ':' subname ] [ operator value ] '%'
The difference is that now you can put operator and values on the end of the
pattern. The following are valid operators: < > <= >= == ~
- == < > <= >=
- Match equals, less than, etc. Should be obvious. One special note is
that if both the match and predicate values are numbers, then the comparison
is done using perl's numerical compare operators. Otherwise, string
comparators are used (eq, lt, gt, etc).
- ~
-
Regular expression match.
Still confused? Let's run through some examples.
- Let's find out what's going on in our auth.log on any day from 20:00 to 20:09:
% sudo cat /var/log/auth.log | ./grok -m '%TIME~/^20:0[0-9]/%'
Sep 15 20:05:24 nightfall sshd[503]: Server listening on :: port 22.
Sep 15 20:05:24 nightfall sshd[503]: Server listening on 0.0.0.0 port 22.
Sep 15 20:07:31 nightfall login: login on ttyv0 as jls
Nov 12 20:09:42 nightfall xscreensaver[647]: FAILED LOGIN 1 ON DISPLAY ":0.0", FOR "jls"
Nov 26 20:07:18 nightfall sshd[494]: Server listening on :: port 22.
Nov 26 20:07:18 nightfall sshd[494]: Server listening on 0.0.0.0 port 22.
- How about looking through 'netstat -s' output for big numbers? Yes, you
can use awk for this particular example.
% netstat -s | ./grok -m "%NUMBER>100000%"
130632 total packets received
130465 packets for this host
114759 packets sent from this host
- Let's look in "all.log" (all syslog stuff goes here) for sshd lines with
an IP starting with '83.'
% ./grok -m "%SYSLOGBASE~/sshd/% .* %IP~/^83\./%" -r "%SYSLOGDATE% %IP%" < all.log
Oct 17 09:54:37 83.170.72.199
Oct 17 09:54:53 83.170.72.199
Oct 17 09:56:02 83.170.72.199
<snip some output >
Apr 16 06:54:52 83.14.104.202
Apr 16 06:54:53 83.14.104.202
Apr 16 06:54:54 83.14.104.202
If you're interested in playing with this new feature,
download grok-20070226.
This seems pretty powerful. Next feature I need to add is the ability to add
predicates to patterns after they've been specified. Something like this would be sweet:
% ./grok -m "%APACHELOG%" -p "%NUMBER:RESPONSE==404%"
< some output showing you all apache log entries with response code 404 >
Something like that, which would let you modify the
%NUMBER:RESPONSE% pattern to add a predicate requiring that it be
404.
Comments: 0 (view comments)
Tags: grok
Permalink: /geekery/grok-pattern-predicates
posted at: 06:40
It's been almost a year since the first release of grok. I've finally found
some energy to put into the project and it's time for another release.
Download: grok-20070224.tar.gz
A quick summary of the changelist (which comes with the tarball):
- Lots of doc updates. More examples in the manpage.
- Lots of new builtin patterns
- More new filters like strftime, ip2host, and uid2user.
- Fancier syslog matching options
- New flags -m and -r. See this post about this change
- filelist, catlist, and filecmd thanks mostly to Canaan Silberberg.
- More tests to make sure that it works. Find these in the 't' directory in the grok tarball.
Email me if the tests provided don't work.
Comments: 0 (view comments)
Tags: grok, projects, releases
Permalink: /geekery/grok-new-release-20070224
posted at: 04:16
What hosts is this machine connected to:
% netstat -anfinet | perl grok -m "%IP:S%.*?%IP:D%" -r "%IP:D|ip2host%" | sort | uniq
fury.csh.rit.edu
mc-in-f104.google.com
mc-in-f147.google.com
scorn.csh.rit.edu
I have no idea the mc-in-f104 stuff is, but firefox is open to 'www.google.com' right now. Let's find out what 'www.google.com' points at:
% host www.google.com | perl grok -m "%IP%" -r "%IP|ip2host%"
mc-in-f147.google.com
mc-in-f99.google.com
mc-in-f104.google.com
I keep finding more uses for grok now that you can use it on the commandline easily.
Comments: 0 (view comments)
Tags: grok
Permalink: /geekery/grok-and-netstat
posted at: 04:13
I recently made a small change to my rss and atom feeds. I add a tracker image in the content. It looks like this:
<img src="/images/spacer.gif?path_of_a_blog_entry">
Any time someone views a post in an RSS feed, that image is loaded and the client browser happily reports the referrer url and I get to track you! Wee.
This is in an effort to find out how many people actually read my blog. Now that I can track viewship of the rss/atom feeds, how do I go about analyzing it? grok to the rescue:
% grep 'spacer.gif\?' accesslog \
| perl grok -m '%APACHELOG%' -r '%IP% %QUOTEDSTRING:REFERRER%' \
| sort | uniq -c | sort -n
<IP addresses blotted out, only a few entries shown>
1 XX.XX.XX.XX "http://www.google.com/reader/view/"
9 XX.XX.XX.XX "http://whack.livejournal.com/friends"
10 XX.XX.XXX.XXX "http://www.bloglines.com/myblogs_display?sub=44737984&site=6302113"
10 XX.XXX.XXX.XX "http://www.semicomplete.com/?flav=rss20"
27 XX.XXX.XXX.XX "http://whack.livejournal.com/friends"
Each line represents a unique viewer, and tells me what the reader was using to
view the feed.
Yay grok.
Comments: 0 (view comments)
Tags: grok, apache logs, logs, blogging
Permalink: /geekery/grok-blog-analysis
posted at: 22:53
I added a new filter to grok: strftime. Same format strings as strftime(3)
provides but you need to use & instead of %. Ie; strftime("&D"). This is useful
when combined with parsedate. I also added a new default pattern, APACHELOG,
which will match a standard apache log entry.
Along with that little addition comes another way cooler addition which lets
you use grok entirely from the command line in a way resembling grep on crack.
For this we needed 2 new command line flags: new flags -m and -r.
- -m : specify a match string
- -r : specify a reaction string. Defaults to "%=LINE%" if omitted.
The reaction string specifies what is printed on a match. There is no support
(yet?) for specifying reactions other than printing out data. If you want a
command to be executed, you could use a clever combination of the shdq filter
and have grok output shell commands. More on that later on.
The implementation of this is somewhat klunky, but it works. Under the hood,
here's what happens:
% grok -m FOO -r BAR
grok takes this and generates the following config in memory:
exec "cat" {
type "all" {
match = "FOO";
reaction = { print meta2string("BAR", $v); };
};
};
The data source being read from is output from cat, which is just a lame hack
to trick grok into reading a file from stdin. This is really useful. Let's try
a few examples:
Grep a file for anything looking like an IP:
We have a log file with IPs in it. Writing a regex to grab any line with an IP on it is annoying. Let's use grok:
% perl grok -m "%IP%" < /var/log/messages | head -5
Feb 7 19:50:52 kenya dhcpd: Forward map from D962WZ71.home to 192.168.10.189 FAILED: Has an A record but no DHCID, not mine.
Feb 7 19:50:52 kenya dhcpd: Forward map from D962WZ71.home to 192.168.10.189 FAILED: Has an A record but no DHCID, not mine.
Feb 7 22:17:16 kenya named[17044]: stopping command channel on 127.0.0.1#953
Feb 7 22:17:18 kenya named[16239]: command channel listening on 127.0.0.1#953
Feb 9 05:11:17 kenya sshd[22002]: error: PAM: authentication error for root from 211.147.17.110
At this point, grok is behaving much like grep, but you get all of the easy
matching power of grok.
Syslog messages with IPs + extra text processing
How about if we want any syslog message with an IP in it, and we want to know what date and program logged it?
% perl grok -m "%SYSLOGBASE% .* %IP%" -r "%SYSLOGDATE|parsedate|strftime('&D')% %PROG% %IP%\n" < /var/log/messages | head -5
02/07/07 dhcpd 192.168.10.189
02/07/07 dhcpd 192.168.10.189
02/07/07 named 127.0.0.1
02/07/07 named 127.0.0.1
02/09/07 sshd 211.147.17.110
Process apache logs
What about the new APACHELOG pattern? Here's a sample usage of it:
% tail -5 access | perl grok -m "%APACHELOG%" -r "%HTTPDATE|parsedate% %QUOTEDSTRING:URL|httpfilter%\n"
1171799519 /blog/geekery/grok-like-grep.html?source=rss20
1171799581 /projects/solaudio
1171799624 /projects/solaudio/
1171799651 /~psionic/seminars/vi/viseminar.html
1171799652 /seminars/vi/viseminar.html
Break a file into parts grouped by IP
What if you want to find out what IP causes the most log chatter? Use the shdq
filter and have grok output shell commands which you then pipe to /bin/sh.
% cat /var/log/messages | perl grok -m '%IP%' -r 'echo "%=LINE|shdq%" >> /tmp/log.%IP%' | sh
% ls /tmp/log.*
/tmp/log.127.0.0.1 /tmp/log.211.147.17.110
/tmp/log.192.168.0.254 /tmp/log.71.70.243.218
% wc -l /tmp/log.*
4 /tmp/log.127.0.0.1
1 /tmp/log.192.168.0.254
70 /tmp/log.211.147.17.110
1 /tmp/log.71.70.243.218
Latest version (potentially unstable, but the above examples work):
Download and enjoy: grok-20070218.tar.gz
Comments: 0 (view comments)
Tags: grok
Permalink: /geekery/grok-like-grep
posted at: 08:04
Update: Vertigo has been released for Firefox 2! Yay :)
The 'Vertigo' extension doesn't work in Firefox 2. Some googling finds a few
solutions, all of which suck. That said, I think I'm going to dive back into
playing with firefox and make an extension.
So far I've managed to get vertical tabs with a scrollbar that pops up when
there are more-than-displayable tabs open. However, much of tonight left me
extremely frustrated.
Development with Firefox seems to be exceedingly dependent on trial-and-error.
Save whatever files, restart firefox. Repeat. Repeat. Repeat. Firefox is not
lightning quick to startup, and I'm not sure how to edit extensions that are
currently running without a restart. Maybe there's a debugger I don't know
about. Mostly I'd just like to explore the DOM while it's running (Firefox's
XUL DOM, not the current web page).
All I wanted to add (tonight) was the ability to choose what side of the
browser the tab bar went on.
The following CSS will move the bar to the right (with my extension):
#appcontent tabbox {
-moz-box-direction: reverse;
}
Also doing <tabbox dir="reverse" in the XUL works too. I need
to set this in javascript.
This means tabbox.style.MozBoxDirection = "reverse" should work, right?
Here's everything I tried:
var tabbox = document.getElementsByTagName("tabbox")[0];
// Doesn't work (trying either 'reverse' or 'rtl'):
tabbox.style.MozBoxDirection = "reverse";
tabbox.style.direction = "reverse";
tabbox.dir = "reverse";
tabbox.direction = "reverse";
//Try to tell the vbox (tab list) to order after/before the browser pane:
tabbox.childNodes[0].ordinal = 0;
tabbox.childNodes[0].ordinal = 2;
I'm at a total loss. My lack of familiarity with XUL is hurting me here. What's
confusing, is the following code outputs "ltr" (left to right), meaning
tabbox.style.direction = "rtl" should work:
var x = window.getComputedStyle(tabbox, ""):
alert(x.getPropertyValue("direction")):
Googling for 'tabbox dir' and other variants doesn't show much promise.
Wrapping the contents of the tabbox in an hbox and attempting to tweak the
direction of the hbox fails, too.
The following code produces something interesting:
alert(tabbox.childNodes[0].nodeName + " / " + tabbox.childNodes[1].nodeName);
The output is "tabs / tabpanel". It should be "vbox / splitter" or something
close to that.
Further investigation lands me at gBrowser.mTabBox which has the
correct children (has the full xul dom within the real tabbox. where
tabbox.childNodes[0] should be a vbox, and it is only when I access mTabBox,
not through the tag lookup.
gBrowser.mTabBox.dir = "reverse";
And voila, the tab bar is on the right.
I'm not sure why the following statements yield different values:
document.getElementsByTagName("tabbox")[0] != gBrowser.mTabBox
Very strange... These should point to the same objects, and while they both are 'tabbox' elements, their children are quite different (the former is an element-trimmed version containing only tabs and tabpanel).
Anybody? ;)
Comments: 4 (view comments)
Tags: firefox, late-night-hacking, oh-my-god-it's-4am
Permalink: /geekery/firefox-2-vertical-tabs-extension-stuff
posted at: 04:13
I wrote a script a while ago to build a very tiny freebsd world. It's extremely
fast and only builds a freebsd image in approximately 10 megs of space. It lets you quickly create new jail enviroments or system images for small embedded platforms.
If you look at the script itself, you'll get an idea of what it installs. I
used a variant of this script to build the system I run on my Soekris net4501
which runs FreeBSD and is under 20 megs.
There are lots of "make a small freebsd system" scripts, but most of the ones
I've found rely heavily on 'buildworld' and what not. This takes a live system
and copies the binaries you need, then uses ldd(1) to track down required
libraries.
view minibsd.sh
Example usage:
kenya(~/t) % rm -rf ./soekris/
kenya(~/t) % time sudo ./minibsd.sh
sudo ./minibsd.sh 0.16s user 0.65s system 61% cpu 1.326 total
kenya(~/t) % sudo chroot ./soekris /bin/sh
# pwd
/
# exit
Simple jail config (rc.conf):
jail_enable="YES"
jail_list="test"
jail_test_rootdir="/home/jls/t/soekris"
jail_test_hostname="test"
jail_test_ip="10.1.1.1"
jail_test_interface="tl1"
Put something simple in this jail's rc.conf (/home/jls/t/soekris/etc/rc.conf):
sshd_enable="YES"
sendmail_enable="NONE"
Let's test the jail now:
kenya(~/t) % sudo /etc/rc.d/jail start
Configuring jails:.
Starting jails:
At this point, it's probably hung (assuming you enabled sshd). If you hit
CTRL+T you'll see what command has the foreground and what it's doing.* This is
because it's prompting you (output is directed to JAILROOT/var/log/console.log)
for entropy for the ssh-keygen. Smash a few keys then hit enter. It'll finish
eventually.
kenya(~/t) % sockstat -4 | grep 10.1.1.1:22
root sshd 2258 3 tcp4 10.1.1.1:22 *:*
Our sshd is running happily inside that jail we made. This whole process took
about 5 minutes.
* FreeBSD's CTRL+T terminal handler feature has to be the best thing ever
invented. I wish Linux had something like this. Here's what hitting CTRL+T when
running cat looks like:
kenya(~) % cat
load: 0.45 cmd: cat 2324 [ttyin] 0.00u 0.00s 0% 600k
load: 0.42 cmd: cat 2324 [ttyin] 0.00u 0.00s 0% 600k
It clearly shows you the command name, the pid, and the syscall-type-thing it's
doing. Clearly cat is waiting for input from the tty. <3 FreeBSD.
Comments: 10 (view comments)
Tags: automation, freebsd, jails, embedded systems
Permalink: /geekery/mini-freebsd-script
posted at: 03:27
For a current project, I need the ability to dynamically grow and shrink a pool
of mysql slaves. In order for replication to work properly, every slave must
have a unique server id. When you want to grow another slave, how do you choose
the server id?
Two slaves with the same server id will replicate successfully, but when they
reach the end of the master's binary log, something freaks out and forces them
to disconnect. This causes both slaves to reconnect, sync (no data needed), and
have the connection die off quickly again. The result of this is rapid
connection/disconnection by both slaves driving the load to 1+ on both slaves,
and to around .3 on the master even in a completely idle system. This is bad.
Therefore, server id collisions are bad.
A simple approach might be to pick a random number. However, depending on your range, collisions may still occur. If there's even a slight chance of collision, you have to detect that collision and try a new number. Collision detection is expensive and can be done one of a few ways:
- query all slaves asking "show global variable like 'server-id'" and
comparing it against the chosen one. This has O(n) runtime, and doesn't scale.
- Set the server id to whatever you picked at random, have a heuristic tool that can detect the behavior that happens when two server ids collide. This is obviously a horrible idea.
Random choice doesn't seem to be very good. Scanning all slaves and picking an
id that isn't in the set of known ids is also bad, as mentioned above. So what now?
We need a number that will never repeat. You might think about using a small
table in the master with an auto_increment column and always get a new id that
way, but why? Time is always increasing. Bonus that mysql's server-id is an
unsigned 32bit value, so unix epoch values will be fine until the distant
future.
A trivial script can generate your my.cnf whenever you bring up a new
slave with the current time as a server id and you're pretty much guaranteed
never to have a collision unless you grow two slaves up at the same second (how
likely is that?).
Simple mysql config:
# my.cnf.in
server-id=SERVERID
Simple script to generate a config with a proper serverid:
#!/bin/sh
m4 -DSERVERID=`date +%s` my.cnf.in > /etc/my.cnf
Make this part of your "add a new mysql slave" setup and you'll a scalable
server-id selection system.
Alternatively, since mysql server-id values are, again, 32 bit, you can simply
use the IP address of the machine itself. Something like this:
#!/usr/bin/perl
# Turn an IP into an integer for use with mysql server IDs (or whatever)
$exp = 3;
map { $x += $_ * (2 ** (8 * $exp--)) } split(/\./, $ARGV[1]);
print $x
I named it ip2int.pl. You can use Socket's inet_aton and unpack to achieve the
same result here.
./ip2int.pl 129.21.60.5
2165652485
Since IPs are in theory unique, you can use use the IP of the mysql server for
its own server ID.
Comments: 0 (view comments)
Tags: mysql, scaling, dynamic scaling
Permalink: /geekery/simple-mysql-slave-id-scaling
posted at: 02:58
After dinner tonight, Wendy and I took a side trip to Borders. I spent a little
while looking online for a decent "things you should know as a
consultant/independent contractor" book, but none of the ones that looked
promising were in the store.
I gave up and wandered around some more and landed in the computer section.
Turns out tech books haven't gotten any better over the years. There are entire
shelves dedicated to things I want to know the least about: Excel, Vista,
Myspace, AJAX. I haven't bought a tech reference book in ages for the simple
reason that they all suck. Sure, they've got useful information, but my
questions are answered much more quickly by a few quick Google searches.
Something in me said "get a book" - which is strange, becuase I usually can't
find the time to read. Excluding one book, I haven't read anything in full
since high school.
That one book was Silence
on the Wire, a great book on passive reconnaisance. It wasn't a novel, it
was a technical book. Rather, it was a technical narrative. It read like a
novel, but the content was similar to a reference manual. It was well written,
and enjoyable to read. If you're a technical guy, and have some interest in
security, then check the book out. Totally worth the read, and bonus that I
learned a few things.
Anyway, back at Borders. I was out of place here among the stacks of Excel,
Myspace, and AJAX for Dummies. Blah. On the top shelf was "Code 2.0" the 2nd
edition of Lawrence Lessig's original. I read the preface, and it looked
interesting. I'll post a review when (if?) I finish it.
Back at my original thought - am I the only one who fails to find real value in
most tech books? I had bad experiences with "learn this programming language"
books. I just can't get into them. Most of them fail for various reasons.
Most programming books have the first 6 chapters filled with the same data as all
the rest:
Chapters 1-3: Computers are not scary. Chapters 4-5: You can make
computers do things! Chapter 6: This is a variable. This is an if statement.
Chapter 7: Oh, you're still here? Hmm, Guess I should start talking about how
to use Python. Chapter 8: Hello World! Chapter 9: Thanks for your $50.
Speaking of python. I highly recommend
diveintopython.org if you want to learn
it and are already familiar with programming. Buy the book or read it online -
choice is good.
I've found the most value from pocket-type references. The same reason short
papers are often more well written and more informative than longer ones.
You've got to cut everything that isn't absolutely necessary. I wish more books
did this. Who wants a 1200 page book on Microsoft Word anyway?
Comments: 3 (view comments)
Tags: reading, internet culture
Permalink: /geekery/new-book
posted at: 04:46
|
Search this site
Navigation
Metadata
Home
About
Resume
My Code
ARP Security
Dynamic DNS with DHCP
OpenLDAP+Kerberos+SASL
PPP over SSH
SSH Security: /bin/false
Week of Unix Tools
Work Efficiency
fex
firefox tabsearch
firefox urledit
grok
keynav
liboverride
newpsm (FreeBSD)
nis2ldap
pam_captcha
poor man's backup
Solaris audio utility
xboxproxy
xdotool
xmlpresenter
xpathtool
misc scripts
Presentations
Yahoo! Hack Day '06
Unix Essentials
Vi/Vim Essentials
Tag Cloud
Calendar
Friends
BarCamp
Kent Brewster
Tantek Çelik
John Resig
Wesley Shields
Tyler Shields
Technorati
|