photo
Jordan Sissel
geek

Fri, 25 May 2007

data sources - Week of Unix Tools; Day 4

Intro

This week-of-unix-tools is intended to be a high concentration of information with little fluff. I'll be covering only GNU versions of the tools, for the sake of choosing only one version for sanity sake.

Data, where are you?

Data comes from lots of places. Loosely categorizing, they come from 3 places:
  1. Files and devices
  2. Output of other tools
  3. The network (via other tools)

cat

Cat means 'concatonate'. It is mostly useful for doing a few things:
  • Cat lots of files together; eg 'cat *.c' for processing by another tool, or generally glueing data sets (from files) together.
  • Make a shell script more readable by making the input more obvious

nc

Netcat. Basically gives you the ability to talk tcp and udp from the shell. You can send data using standard input, and receive data from standard output. Simple.
tcp client (connect to google.com port 80)
nc google.com 80
tcp server (listen on port 8080)
nc -l 8080
udp client (connect to ns1.slashdot.org port 53)
nc -u ns1.slashdot.org 53
udp server (listen on port 5353)
nc -l -u 5353
Examples:
Basic HTTP request
% echo "GET / HTTP/1.0\n" | nc google.com 80 | head -1
HTTP/1.0 200 OK

openssl

openssl is a command that any unix-like system will probably have installed. The command itself can do many many things, but for this article I'll only cover the s_client command.

'openssl s_client' is essentially 'netcat + ssl'. This tool is extremely useful for debugging text-based protocols behind SSL such as ssl'd nntp, imaps, and https.

Example:
Open an https connection to addons.mozilla.org
% echo "GET / HTTP/1.0\r\n\r\n" \
| openssl s_client -quiet -connect addons.mozilla.org:443 \
| col \
| sed -e '/^$/q'
depth=3 /C=BE/O=GlobalSign nv-sa/OU=Root CA/CN=GlobalSign Root CA
verify error:num=19:self signed certificate in certificate chain
verify return:0
HTTP/1.1 302 Found
Date: Fri, 25 May 2007 10:07:25 GMT
Server: Apache/2.0.52 (Red Hat)
Location: http://www.mozilla.com/
Content-Length: 293
Keep-Alive: timeout=300, max=1000
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
* The 'col' command will strip the \r (carriage return) characters from the http response, allowing sed's /^$/ to match an empty line (end of headers).

GET/curl/wget/fetch

You can query webservers (http) with any number of tools and you'll get the raw source or data for any page you query. This is really useful.
  • GET, POST, lwp-request, et al. Comes with libwww-perl
  • curl
  • wget
  • fetch (FreeBSD)
Most of the time I need to fetch pages to stdout, I use GET, becuase it's less typing. Here's some examples of the above commands:
Fetch / from www.w3schools.com and output page to stdout
  • GET http://www.w3schools.com/
  • wget -O - -q http://www.w3schools.com/
  • fetch -o -q http://www.w3schools.com/
  • curl http://www.w3schools.com/

w3m/lynx

But what if you don't want the raw html from a webpage? You can have w3m and lynx do some basic rendering for you, also to stdout. I recommend w3m instead of lynx, but use whatever.
  • w3m -dump http://www.google.com/
  • lynx -dump http://www.google.com/
w3m's output looks like this.

ssh

ssh can be a data source too. Run a command on 1000 machines and process the output locally, for fun and profit.

Login to N systems and get uptime. Prefix output with the hostname
% echo "fury\ntempest" \
| xargs -n1 -i@ sh -c 'ssh @ "uptime" | sed -e "s/^/@/"'
fury  6:18am  up  2:25,  1 user,  load average: 0.06, 0.04, 0.04
tempest 06:18:00 up  9:01,  2 users,  load average: 0.12, 0.09, 0.09
 
Combining xargs and ssh gives you a powerful ability to execute commands on multiple machines easily, even in parallel.

Comments: 0 (view comments)

Permalink: /articles/week-of-unix-tools/day-4-data-sources
posted at: 06:21


0 responses to 'data sources - Week of Unix Tools; Day 4'


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional. Not posted or recorded anywhere, ever)
URL (also optional)
Comment:


Search this site

Navigation

Metadata

Home About Resume My Code

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< May 2007 >
SuMoTuWeThFrSa
   1 2 3 4 5
6 7 8 9101112
13141516171819
20212223242526
2728293031  

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati