Search this site

Metadata

Articles

Projects

Presentations

Flex start conditions

I finally (after a bit of searching and thinking) figured how to properly match comments with flex/bison. Horray :)

Some tutorials use this in flex:

#.*$
But since flex chooses longest-match-first for the next token, any line with a # in it might have the remainder of the line accidentally cut as a comment. The right way to do this appears to be:
/* in your .lex file */

%x LEX_COMMENT

%%

"#" { BEGIN(LEX_COMMENT); }
<LEX_COMMENT>[^\n]*  /* ignore comments. */
<LEX_COMMENT>\n   { yylineno++; BEGIN(INITIAL); } /* end comment */
There's a mini state-machine going on here. When it matches a "#" it moves into the 'LEX_COMMENT' (name chosen by me, could be anything) state where only tokens in this state are accepted. Now my config files can ignore comments properly: only when outside the presence of any other token (like a quoted string).

Details here

Brute force ssh goes distributed

I was working with grok tonight when I noticed this in a randomly-selected machine's logs:
Nov 26 02:12:53 scorn sshd[77981]: error: PAM: ... christmas from 124.42.124.87
Nov 26 02:14:46 scorn sshd[77987]: error: PAM: ... christmas from 83.16.61.114
Nov 26 02:18:49 scorn sshd[78035]: error: PAM: ... christoffer from 220.199.6.2
Nov 26 02:20:33 scorn sshd[78047]: error: PAM: ... christoffer from 124.42.124.87
Nov 26 02:26:21 scorn sshd[78071]: error: PAM: ... christopher from 70.46.140.187
Nov 26 02:28:18 scorn sshd[78074]: error: PAM: ... christos from 80.32.193.169
Nov 26 02:30:16 scorn sshd[78085]: error: PAM: ... christos from 201.161.28.9
Nov 26 02:34:17 scorn sshd[78104]: error: PAM: ... christy from 200.181.121.26
Nov 26 02:36:12 scorn sshd[78126]: error: PAM: ... christy from 211.154.254.89
Nov 26 02:38:09 scorn sshd[78129]: error: PAM: ... christy from 58.39.145.213
Nov 26 02:40:08 scorn sshd[78149]: error: PAM: ... chroma from 62.97.62.155
Nov 26 02:42:10 scorn sshd[78164]: error: PAM: ... chroma from 83.19.224.11
Nov 26 02:44:02 scorn sshd[78185]: error: PAM: ... chroma from 189.43.21.244
Nov 26 02:45:57 scorn sshd[78223]: error: PAM: ... chuck from 200.248.82.130
(I trimed the lines horizontally for content)

The usual pattern of dictionary-ordered username attempts and two-minute intervals was there, but the anomaly here was that the source host was changing.

This is new to me as it looks like the botnets that walk around trying to brute force ssh access have gotten distributed. Instead of a single host walking usernames, multiple hosts are doing it.

That's awesome.

As a side note, this probably puts the kibosh on non-collaborative IDS tools that bans repeated, failed ssh attempts from a single host.

Systems Administration Advent Calendar

I like the Perl Avent Calendar and have followed it for a few years now (even as my Perl usage has declined).

Some quick googling didn't find any hits for a similar thing for systems administration, so I'm starting one.

I need your help; interested in contributing ideas or content? Email me jls-sa@semicomplete.com.

Upgraded to ubuntu 8.10

Upgraded my workstation to ubuntu 8.10 without much stress. Horray.

Feels like firefox is running slower now, but that might just be perceived slowness or network crappiness. Still, I ran top(1) and saw some new 'chipcardd4' process waking up every few seconds and bursting cpu. It wasn't doing enough to cause problems, but I'd never seen it before. Checking strace showed it in a loop sleeping for 750ms, twice, then reading through /sys/devices/. Doing "sudo ./libchipcard-tools stop" made it go away.

Doesn't linux have inotify so programs don't have to do sleepy-loops checking the filesystem?

Powershell to remove dotfiles

I just rsync'd all willy-nilly (not something I recommend). I copied stuff I was working on to a test windows box. Poking around, I realized I copied over a pile of dotfiles (.svn, vim backup files, etc).

In any bourne shell, this fix would be a simple find invocation (depending on how modern your find(1) is):

 find somepath -name '.*' -delete 
Back in Windows, my first reaction was to be sad because I didn't know a simple oneliner to do the same thing. Then, I remembered powershell was installed and made this kind of stuff easy.
 dir -recurse | where {$_.name -match "^\."} | rm 
Delicious.

Breaking efnet's silly figlet captcha

This was all done as a fun experiment to see if automating the efnet captcha was doable.

A few (all? many?) efnet servers use a figlet captcha on irc clients connecting from hosts that aren't running identd. While this blends happily with the same kind of captcha I put into pam_captcha, it's too easy to break.

Specifically, it uses 6 characters, A-Z. Generating a lookup table is as easy as a few lines of code. Generating the lookup table for all combinations using the previous script would be almost 11 gigs. It stores MD5 values of figlet output instead of the figlet output to save space and make for simpler lookups (40 bytes per entry, including newline, uncompressed).

However, if you don't respond answer correctly within a short period, you get disconnected. Timing it shows you have 30ish seconds. It's probably not feasible to grep through 11 gigs of data in 30 seconds, is it? That's reading through almost 400 mbytes per second. Then again, that's if you store it as a flat, unsorted structure.

If you sort the data by MD5, you get the benefits of a binary search, which finds you a result in 19 iterations. Doing binary search in ruby (like most languages) is very simple. Here's bsearch.rb

The output is 'token md5' and on 11 gigs of data, and GNU sort is smart enough to use disk for merge sorting on large files. However, I did this first, instead, since I assumed sort would be dumb and try to sort all in memory:

choplog -x -p /b/split -b $((50 << 20)) /c/captchas \
| xargs -n1 -tP2  sh -c 'sort -k2 $1 > sort.$(basename $1)' -
sort --merge /b/sort.* > /b/sortedcaptchas
choplog is a project I did last year when I needed a fast way to split large logfiles (GNU split is slower and less-featured for this task). I split the output into 50 meg chunks, sort each chunk, then use sort's merge feature to merge all the data back together quickly.

As it turns out, I don't need to do any of the above splitting and sorting because gnu sort is smart enough to properly merge sort on-disk for really large files. You can tweak the memory buffer size with the -S flag and the temp directory with -T. The manpage says you can specify buffer sizes with unit notations (M, G, etc) and they go all the way up through E, Z, and Y... just in case you have a yotabyte of memory? ;)

While I was waiting for the table to generate, I started poking fetching a few captchas for testing. It seems like the server I'm connecting to is using a different version of figlet or a different version of the fonts or that figlet is being invoked differently. The spacing between only some letters is off.

I can reliably get results if I figlet each letter and paste them together like:

# This matches efnet's captcha output
paste -d "" <(figlet -fbig W) <(figlet -fbig S)
instead of
# This doesn't match efnet's captcha output
figlet -fbig WS
Playing with the kerning options (using -k or -W) doesn't produce the right output either, only pasting together does.

Pretty close to automatically passing the captcha, but I stopped caring about it. I've run out of energy working on this. I did learn a few edge case bits though about gnu sort and had a reasonable excuse to dork around with ruby that didn't involve $work. It also reminded me how much muscle memory I still have for using xargs.