Search this site

Page 1 of 2  [next]





Shebang (#!) fix.

Most shebang implementations seem to behave contrary to my expectations.

As an example, prior to today, I would have expected the following script to output 'debug: true'

#!/usr/bin/env ruby -d
puts "debug: #{$DEBUG}"
Running it, I get this:
% ./test.rb
/usr/bin/env: ruby -d: No such file or directory
This is because the 'program' executed is '/usr/bin/env' and the first argument passed is 'ruby -d', exactly as if you had run: /usr/bin/env "ruby -d"

My expectation was that the above would behave exactly like this:

% /usr/bin/env ruby -d test.rb
debug: true
It doesn't. The workaround, however, is pretty trivial. It's only a few lines of C to get me a program that works as I want. I call the program 'shebang'. Why is it C and not a script? Because most platforms have a requirement that the program being executed from the shebang line be a binary, not another script.

#!/usr/local/bin/shebang ruby -d
puts "debug: #{$DEBUG}"
Now run the script again, with our new shebang line:
% ./test.rb
debug: true
Simple and works perfectly.

Week of unix tools; day 5: xargs

Day 5 is online. It's about how to rock out with your friend, xargs(1).

day 5; xargs

Week of unix tools; day 4: data source tools

Day 4 is finally ready for consumption, a bit late ;)

This article touches: cat, nc, ssh, openssl, GET, wget, w3m, and others. It's designed to show you a pile of tools you can use to pull data from various places.

day 4; data sources

Week of unix tools; day 3: awk!

Day 3 is ready for viewing. It's about awk.

This article has lots of usage examples for the many ways you can use awk to do hard work for you. Check out the article here:

day 3: awk

Week of unix tools; day 2: cut and paste!

Day 2 is ready for viewing. It's about cut and paste.

Candice suggested these two tools are underutilized and are very useful when you need the features they provide.

Check out the article here:

day 2: cut and paste

Week of unix tools; day 1: Sed!

Day 1 is ready for viewing. It's about sed, something I feel many sysadmins (and others) neglect in favor of perl, awk, or other tools. It's a super useful tool. Check out the article here:

Day 1: sed

Week of unix tools!

This past week of work had me stretching my piping-ninja skills to the limits. In the past 4 days, I have created a oneliner that invoked xargs 3 times in a series of piped expressions, a oneliner that had xargs calling xargs, a oneliner that was well over 1000 characters and invoked sed, xargs, grep, perl, awk, xpathtool and others, sometimes twice, and several other ninja-like uses of filter-fu involving unix pipes. It's been a fun week for oneliners.

So, I thought I might spend the next 7 days covering some of the tools I find myself working with almost every day. I hope to cover all the ways I use a given tool, when it should be used over any alternatives, things to look up in the manpage, etc.

I'll try to cover more than one tool per day, because some tools can be batched together because there isn't much to day other than "this is useful for foo, bar, anz baz".

Tentative, incomplete list:

data sources
cat, GET, echo, nc (netcat), shell command substitution
cut, grep, paste, sed, sort, tr, uniq, wc
xargs, sh, awk, perl (for oneliners)
Here's a sneak peek at what you might learn:

'sort' can sort lines, right? But how do you sort words on the same line?

% echo "one two three four five" \
  | xargs -n1 | sort | xargs
five four one three two
Some of you might notice that the xargs(1) invocations can be replaced with tr(1). Yes. I use xargs(1) because it's less typing and handles many case tr(1) won't. This is not to say tr(1) doesn't have it's wonderful uses. More on that later this week ;)

If you have suggestions, let me know.

Wakeup call using /bin/sh and at(8)

Using at(8), I can schedule jobs to occur, say, when I need to wake up.
nightfall(~) % atq
Date                            Owner           Queue   Job#
Thu Jul 13 07:30:00 PDT 2006    jls             c       14
Thu Jul 13 08:00:00 PDT 2006    jls             c       15
Thu Jul 13 08:20:00 PDT 2006    jls             c       16
All of those jobs run my '' script which is somewhat primitive, but it does the job.

Using this script: scripts/

Parallelization with /bin/sh

I have 89 log files. The average file size is 100ish megs. I want to parse all of the logs into something else useful. Processing 9.1 gigs of logs is not my idea of a good time, nor is it a good application for a single CPU to handle. Let's parallelize it.

I abuse /bin/sh's ability to background processes and wait for children to finish. I have a script that can take a pool of available computers and send tasks to it. These tasks are just "process this apache log" - but the speed increase of parallelization over single process is incredible and very simple to do in the shell.

The script to perform this parallization is here:

I define a list of hosts to use in the script and pass a list of logs to process on the command line. The host list is multiplied until it is longer than the number of logs. I then pick a log and send it off to a server to process using ssh, which calls a script that outputs to stdout. Output is captured to a file delimited by the hostname and the pid.

I didn't run it single-process in full to compare running times, however, parallel execution gets *much* farther in 10 minutes than single proc does. Sweet :)

Some of the log files are *enormous* - taking up 1 gig alone. I'm experimenting with split(1) to split these files into 100,000 lines each. The problem becomes that all of the tasks are done except for the 4 processes handling the 1 gig log files (there are 4 of them). Splitting will make the individual jobs smaller, allowing us to process them faster becuase we have a more even work load across proceses.

So, a simple application benefiting from parallelization is solved by using simple, standard tools. Sexy.

Parsing nfsstat(1) for only version X information

nfsstat | sed -ne '/Version 3/,/^$/p'

When I was bored (at 4 am, no-less), I kept trying to parse this information out using some crazy tricks with 'x' (swap pattern/hold) and other stuff, but I forgot the fact that regexps are valid addresses. So, we can print anything between 'Version 3' and blank lines, anywhere in our output.

The next thing I want to try with this is to automagically parse nfsstat output into a format that is more machine readable, this will probably be using awk or perl, seeing as how doing it with sed may hurt my brain a bit. Furthermore, trying to read the sed that did said operations would be somewhat intense ;)

The output looks something like this, on Solaris 9:

Version 3: (535958 calls)
null        getattr     setattr     lookup      access      readlink    
0 0%        242223 45%  20606 3%    52504 9%    20025 3%    41 0%       
read        write       create      mkdir       symlink     mknod       
14138 2%    146618 27%  5525 1%     145 0%      337 0%      0 0%        
remove      rmdir       rename      link        readdir     readdirplus 
6279 1%     7 0%        1539 0%     1518 0%     1606 0%     6587 1%     
Parsing this would mean generating a tree-like dictionary. In perl, it may look like:
%foo = (
	'Version 3' => {
		null => 0,
		getattr => 242223,
		setattr => 20606,
		lookup => 52504,
		# .... etc ...
Should be simple enough, we'll see what happens next time I get bored.