Search this site


Metadata

Articles

Projects

Presentations

gzip and other file progress checking

While I wait for a mysql backup to replay against a test server, I was wondering how far along the replay was.

I am building the database using this command:

% gzip -dc admin.sql.20100511.060001.gz| mysql -uroot -proot

Pipes have a finite buffer. That is, the mysql command is busy reading stuff from stdin while gzip is busy writing to stdout. If gzip outputs faster than mysql can consume, gzip will end up filling the pipe's buffer and its next write will block, pausing it momentarily until mysql can catch up.

If we can inspect what position gzip is currently at, we can use that data along with the input file size to give us a progress indicator.

In linux 2.6.22 and beyond, there is /proc/<pid>/fdinfo which will tell you the current seek position of any open file in a process, so let's use that. First we'll need to find what file descriptor number is the input file, then ask what position it is at.

% pgrep -f 'gzip -dc admin'
6261
% PID=6261
% cd /proc/$PID/
% ls -l fd
total 0
lrwx------ 1 jsissel jsissel 64 May 11 15:04 0 -> /dev/pts/12
l-wx------ 1 jsissel jsissel 64 May 11 15:04 1 -> pipe:[31912237]
lrwx------ 1 jsissel jsissel 64 May 11 15:04 2 -> /dev/pts/12
lr-x------ 1 jsissel jsissel 64 May 11 15:04 3 -> /home/jsissel/admin.sql.20100511.060001.gz 
% cat fdinfo/3
pos:    149028864
flags:  0104000

# Get the file size
% size=$(stat -L fd/3 -c "%s")

# Get the position
% pos=$(awk '/pos:/ { print $2 }' fdinfo/3)

% echo "$pos/$size" | bc -l
.25203137570698241973
The above output says we are 25% complete.

You could go a step further and include the process uptime to show an estimate of time remaining.

cd /proc/$PID
size=$(stat -L fd/3 -c "%s")
pos=$(awk '/pos:/ { print $2 }' fdinfo/3)
start=$(date -d "$(ps -p $PID -o lstart=)" +%s)
now=$(date +%s)
echo "Minutes elapsed:"
echo "($now - $start) / 60" | bc -l
echo "Minutes remaining (estimated):"
echo "((($now - $start) / ($pos / $size)) - ($now - $start)) / 60" | bc -l

# Output
Minutes elapsed:
55.81666666666666666666
Minutes remaining (estimated):
110.58584668301847161105
Update: Some commenters pointed out pv as a solution here. It looks pretty good.

Revision 2000

I spent some time putting love into cgrok (uses libpcre) tonight.
  • Logging facility to help in debugging. Lets you choose what features you want logging (instead of lame warn/info/number log levels)
  • Added string and number comparison predicates
  • Wrote a few more tests which uncovered some bugs
I also broke 2000 revisions in subversion. Yay.
Sending        test/Makefile
Transmitting file data .
Committed revision 2001.