I think it's fair to say that not enough people know sed. Mostly, because
it's probably scary. This week-of-unix-tools is intended to be a high
concentration of information with little fluff. I'll be covering only GNU
versions of the tools, for the sake of choosing only one version for
sanity sake.
Sed is short for 'stream editor' and basically lets you do lots of things
to streams of text.
sed [-lrn] [-e 'sedscript'] [file1 file2 ...]
-l means line buffered (ie; flush output every line),
-r means use extended regex,
-n silences
default output, and
-e should be self explanatory. There
are other flags (such as -f) but I never use them. Seek the man page for
more information.
If you've ever seen the perlism
s/foo/bar/, that came from
sed. Sed is basically a string processing language. The language
consists of a very small grammar, but is still very powerful. Here are
some examples:
- Simple text replacement.
% echo "Hello there foo" | sed -e 's/foo/bar/'
Hello there bar
- Grep-like behavior.
% sed -ne '/FreeBSD/p' /etc/motd
FreeBSD 6.2-PRERELEASE (FOO) #0: Sat Nov 11 00:12:52 EST 2006
Welcome to FreeBSD!
- Grep '-v' like behavior
% echo "foo\nbar\nbaz\nfoobar" | sed -ne '/foo/!p'r
bar
baz
Backreferences are using a captured group's matched value later in your
pattern. You group regexp patterns with parenthesis, but in non-extended
mode (ie; without -r), you must escape your parentheses. Example:
% echo "hello world" | sed -e 's/\([a-z]*\) world/\1 sed/'
hello sed
# Now with -r (or -E on FreeBSD and OS X):
% echo "hello world" | sed -r -e 's/([a-z]*) world/\1 sed/'
hello sed
There is a special "reference" when using substitution (s///). Ampersand
(&). This will expand to the entire matched pattern:
% echo "hello world" | sed -e 's/.*/I say, "&"/'
I say, "hello world"
Sed syntax is pretty straight forward. A general expression will look
like this:
address[,address]function
That's it. Expressions are separated by newlines or semicolons.
A address is a way to indicate a location in your data stream. An address can be any of:
- A line number (eg 1). The first line is '1'
- A regexp match expression, such as /foo/.
- The literal '$', which means 'last line of file'
- Nothing at all, which means "every line in the file"
If you specify two addresses, it means "inclusive" of the first and last
address, and includes all lines in between. After the last address is
hit, the first address is searched for again further down the file. More
on this later.
Functions are always one-letter in sed. The useful ones (to me) are:
- p (print)
- s (substitute)
- d (delete)
- x (swap pattern and hold buffer)
- h and H (copy and append to hold buffer)
- ! (apply the next function against lines not matched)
- Print the first line of input (same as head -n 1)
sed -ne 1p
- Print everything *except* the first line
sed -ne '1!p' # print everything not on the first line
or
sed -e '1d' # delete the first line
# default action is to print, so everything else is printed
- Print the first non-whitespace, non-comment line in httpd.conf
sed -ne '/^[^# ]/{p;q;}' httpd.conf
or
sed -ne '/^#/! { /^ *$/! { p;q; }; }' httpd.conf
- Show only 'Received:' headers in a mail
-
% cat mymail \
| sed -ne '/^[A-Za-z0-9]/ { x; /^Received: /{p;}; }; /^[A-Za-z0-9]/!H'
Received: from localhost (localhost [127.0.0.1])
by whitefox.csh.rit.edu (Postfix) with ESMTP id 731F81145C
for <email-snipped>; Sat, 19 May 2007 01:19:30 -0400 (EDT)
Received: from whitefox.csh.rit.edu ([127.0.0.1])
by localhost (whitefox.csh.rit.edu [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id EURHKUeHSrao for <email-snipped>;
Sat, 19 May 2007 01:19:16 -0400 (EDT)
... etc ...
Noisey code, eh? Gets the job done though. There are two checks here.
The first pattern checks to see if the line starts with a letter or
number, if so, it swaps to the "hold" buffer and checks if it starts
with 'Received:' and prints if it does. The side effect is that the
current input line is now in the hold buffer and the old header
"line" is in the pattern space, which we discard. After that, we
check if the line does *not* start with a letter or number, in which
case we append the input (aka pattern space) to the hold space.
Basically, we build the current header (which can be multiple lines)
in the hold buffer until the next header happens.
- Output a file, but color matched patterns.
# The '^[' below are raw escape characters, entered at the shell
# with CTRL+V and hitting escape.
% dmesg | sed -e 's/ath0/^[[33m&^[[0m/g'

You can use sed to "grep" paragraphs of data using similar techniques to
the above mail header example. This script will let you 'grep' whole
paragraphs (empty-line-delimited).
#!/bin/sh
if [ $# -eq 0 -o "${1:-}" = "-h" ] ; then
echo "usage: $0 [-v] pattern [files]"
return 1
fi
func='!d'
if [ "$1" = "-v" ]; then
# support '-v' like 'grep -v'
func='d'
shift
fi
pattern="$1"
shift
sed -ure '/./{H;$!d;}; '"x;/${pattern}/$func;" $1
Call this 'sgrep.sh', put it somewhere, and make it executable. Let's use
it to find anything with 'Delete' and 'cycle' in FreeBSD's sed manpage :
% man sed | ./sgrep.sh 'Delete .* cycle'
[2addr]d
Delete the pattern space and start the next cycle.
[2addr]D
Delete the initial segment of the pattern space through the first
newline character and start the next cycle.
-
The 's' function has a 'p' flag, which prints only if a substitution was made.
# this:
sed -ne '/foo/ { s/foo/bar/; p }'
# is the same as
sed -ne 's/foo/bar/p'
-
You can insert data into the hold space (or the pattern space) if you really want:
# Print 'Hello there' before the second line
% echo "one\ntwo\nthree" | sed -e '2 { x; s/.*/Hello there/; p; x; }'
one
Hello there
two
three
Given your choice of filter tools, sed is an extremely useful one that
often allows you to describe what you want to do with your text in a
shorter, simpler form than awk or perl can offer you. If you wish to
venture down the path of unix ninja, then sed should be on your list of
commands to understand.
Want to really make your eyes hurt? Check out
this calculator written
entirely in sed.