photo
Jordan Sissel
geek

Sat, 19 May 2007

sed - Week of Unix Tools; Day 1

Intro

I think it's fair to say that not enough people know sed. Mostly, because it's probably scary. This week-of-unix-tools is intended to be a high concentration of information with little fluff. I'll be covering only GNU versions of the tools, for the sake of choosing only one version for sanity sake.

What is sed?

Sed is short for 'stream editor' and basically lets you do lots of things to streams of text.

Basic usage and Invocation

sed [-lrn] [-e 'sedscript'] [file1 file2 ...]
-l means line buffered (ie; flush output every line), -r means use extended regex, -n silences default output, and -e should be self explanatory. There are other flags (such as -f) but I never use them. Seek the man page for more information.

If you've ever seen the perlism s/foo/bar/, that came from sed. Sed is basically a string processing language. The language consists of a very small grammar, but is still very powerful. Here are some examples:

Simple text replacement.
% echo "Hello there foo" |  sed -e 's/foo/bar/'
Hello there bar
Grep-like behavior.
% sed -ne '/FreeBSD/p' /etc/motd
FreeBSD 6.2-PRERELEASE (FOO) #0: Sat Nov 11 00:12:52 EST 2006
Welcome to FreeBSD!
Grep '-v' like behavior
% echo "foo\nbar\nbaz\nfoobar" | sed -ne '/foo/!p'r
bar
baz

Backreferences

Backreferences are using a captured group's matched value later in your pattern. You group regexp patterns with parenthesis, but in non-extended mode (ie; without -r), you must escape your parentheses. Example:
% echo "hello world" | sed -e 's/\([a-z]*\) world/\1 sed/'
hello sed

# Now with -r:
% echo "hello world" | sed -e 's/([a-z]*) world/\1 sed/'
hello sed
There is a special "reference" when using substitution (s///). Ampersand (&). This will expand to the entire matched pattern:
% echo "hello world" | sed -e 's/.*/I say, "&"/'
I say, "hello world"

Syntax and Functions

Sed syntax is pretty straight forward. A general expression will look like this:

address[,address]function

That's it. Expressions are separated by newlines or semicolons.

What is a address?

A address is a way to indicate a location in your data stream. An address can be any of:
  1. A line number (eg 1). The first line is '1'
  2. A regexp match expression, such as /foo/.
  3. The literal '$', which means 'last line of file'
  4. Nothing at all, which means "every line in the file"
If you specify two addresses, it means "inclusive" of the first and last address, and includes all lines in between. After the last address is hit, the first address is searched for again further down the file. More on this later.

What are functions?

Functions are always one-letter in sed. The useful ones (to me) are:
  • p (print)
  • s (substitute)
  • d (delete)
  • x (swap pattern and hold buffer)
  • h and H (copy and append to hold buffer)
  • ! (apply the next function against lines not matched)

What can I do with sed?

Print the first line of input (same as head -n 1)
 sed -ne 1p 
Print everything *except* the first line
sed -ne '1!p' # print everything not on the first line
or
sed -e '1d'   # delete the first line
              # default action is to print, so everything else is printed
Print the first non-whitespace, non-comment line in httpd.conf
sed -ne '/^[^# ]/{p;q;}' httpd.conf
or
sed -ne '/^#/! { /^ *$/! { p;q; }; }' httpd.conf
Show only 'Received:' headers in a mail
% cat mymail \
  | sed -ne '/^[A-Za-z0-9]/ { x; /^Received: /{p;}; }; /^[A-Za-z0-9]/!H' 
Received: from localhost (localhost [127.0.0.1])
        by whitefox.csh.rit.edu (Postfix) with ESMTP id 731F81145C
        for <email-snipped>; Sat, 19 May 2007 01:19:30 -0400 (EDT)
Received: from whitefox.csh.rit.edu ([127.0.0.1])
        by localhost (whitefox.csh.rit.edu [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id EURHKUeHSrao for <email-snipped>;
        Sat, 19 May 2007 01:19:16 -0400 (EDT)
... etc ...
  
Noisey code, eh? Gets the job done though. There are two checks here. The first pattern checks to see if the line starts with a letter or number, if so, it swaps to the "hold" buffer and checks if it starts with 'Received:' and prints if it does. The side effect is that the current input line is now in the hold buffer and the old header "line" is in the pattern space, which we discard. After that, we check if the line does *not* start with a letter or number, in which case we append the input (aka pattern space) to the hold space.

Basically, we build the current header (which can be multiple lines) in the hold buffer until the next header happens.
Output a file, but color matched patterns.
# The '^[' below are raw escape characters, entered at the shell 
# with CTRL+V and hitting escape.
% dmesg | sed -e 's/ath0/^[[33m&^[[0m/g'

Use sed to make a 'section grep' tool

You can use sed to "grep" paragraphs of data using similar techniques to the above mail header example. This script will let you 'grep' whole paragraphs (empty-line-delimited).
#!/bin/sh

re="$1"
shift
[ "$#" -eq 0 ] && set -- -

sed -rne '/^$/!H; /^$/ { x; /'"$re"'/p; }; ${ x; /'"$re"'/p; d; }' "$@"
Call this 'sgrep.sh', put it somewhere, and make it executable. Let's use it to find anything with 'Delete' and 'cycle' in FreeBSD's sed manpage :
% man sed | ./sgrep.sh 'Delete .* cycle' 

     [2addr]d
             Delete the pattern space and start the next cycle.

     [2addr]D
             Delete the initial segment of the pattern space through the first
             newline character and start the next cycle.

Bonus notes

  • The 's' function has a 'p' flag, which prints only if a substitution was made.
    # this:
    sed -ne '/foo/ { s/foo/bar/; p }'
    
    # is the same as
    sed -ne 's/foo/bar/p'
    
  • You can insert data into the hold space (or the pattern space) if you really want:
    # Print 'Hello there' before the second line
    % echo "one\ntwo\nthree" | sed -e '2 { x; s/.*/Hello there/; p; x; }'
    one
    Hello there
    two
    three
    

Ok, now what?

Given your choice of filter tools, sed is an extremely useful one that often allows you to describe what you want to do with your text in a shorter, simpler form than awk or perl can offer you. If you wish to venture down the path of unix ninja, then sed should be on your list of commands to understand.

Want to really make your eyes hurt? Check out this calculator written entirely in sed.

Comments: 0 (view comments)

Permalink: /articles/week-of-unix-tools/day-1-sed
posted at: 05:38


0 responses to 'sed - Week of Unix Tools; Day 1'


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional. Not posted or recorded anywhere, ever)
URL (also optional)
Comment:


Search this site

Navigation

Metadata

Home About Resume My Code

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< May 2007 >
SuMoTuWeThFrSa
   1 2 3 4 5
6 7 8 9101112
13141516171819
20212223242526
2728293031  

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati