Search this site





data sources - Week of Unix Tools; Day 4


This week-of-unix-tools is intended to be a high concentration of information with little fluff. I'll be covering only GNU versions of the tools, for the sake of choosing only one version for sanity sake.

Data, where are you?

Data comes from lots of places. Loosely categorizing, they come from 3 places:
  1. Files and devices
  2. Output of other tools
  3. The network (via other tools)


Cat means 'concatonate'. It is mostly useful for doing a few things:
  • Cat lots of files together; eg 'cat *.c' for processing by another tool, or generally glueing data sets (from files) together.
  • Make a shell script more readable by making the input more obvious


Netcat. Basically gives you the ability to talk tcp and udp from the shell. You can send data using standard input, and receive data from standard output. Simple.
tcp client (connect to port 80)
nc 80
tcp server (listen on port 8080)
nc -l 8080
udp client (connect to port 53)
nc -u 53
udp server (listen on port 5353)
nc -l -u 5353
Basic HTTP request
% echo "GET / HTTP/1.0\n" | nc 80 | head -1
HTTP/1.0 200 OK


openssl is a command that any unix-like system will probably have installed. The command itself can do many many things, but for this article I'll only cover the s_client command.

'openssl s_client' is essentially 'netcat + ssl'. This tool is extremely useful for debugging text-based protocols behind SSL such as ssl'd nntp, imaps, and https.

Open an https connection to
% echo "GET / HTTP/1.0\r\n\r\n" \
| openssl s_client -quiet -connect \
| col \
| sed -e '/^$/q'
depth=3 /C=BE/O=GlobalSign nv-sa/OU=Root CA/CN=GlobalSign Root CA
verify error:num=19:self signed certificate in certificate chain
verify return:0
HTTP/1.1 302 Found
Date: Fri, 25 May 2007 10:07:25 GMT
Server: Apache/2.0.52 (Red Hat)
Content-Length: 293
Keep-Alive: timeout=300, max=1000
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
* The 'col' command will strip the \r (carriage return) characters from the http response, allowing sed's /^$/ to match an empty line (end of headers).


You can query webservers (http) with any number of tools and you'll get the raw source or data for any page you query. This is really useful.
  • GET, POST, lwp-request, et al. Comes with libwww-perl
  • curl
  • wget
  • fetch (FreeBSD)
Most of the time I need to fetch pages to stdout, I use GET, becuase it's less typing. Here's some examples of the above commands:
Fetch / from and output page to stdout
  • GET
  • wget -O - -q
  • fetch -o -q
  • curl


But what if you don't want the raw html from a webpage? You can have w3m and lynx do some basic rendering for you, also to stdout. I recommend w3m instead of lynx, but use whatever.
  • w3m -dump
  • lynx -dump
w3m's output looks like this.


ssh can be a data source too. Run a command on 1000 machines and process the output locally, for fun and profit.

Login to N systems and get uptime. Prefix output with the hostname
% echo "fury\ntempest" \
| xargs -n1 [email protected] sh -c 'ssh @ "uptime" | sed -e "s/^/@/"'
fury  6:18am  up  2:25,  1 user,  load average: 0.06, 0.04, 0.04
tempest 06:18:00 up  9:01,  2 users,  load average: 0.12, 0.09, 0.09
Combining xargs and ssh gives you a powerful ability to execute commands on multiple machines easily, even in parallel.