Search this site


Metadata

Articles

Projects

Presentations

gzip and other file progress checking

While I wait for a mysql backup to replay against a test server, I was wondering how far along the replay was.

I am building the database using this command:

% gzip -dc admin.sql.20100511.060001.gz| mysql -uroot -proot

Pipes have a finite buffer. That is, the mysql command is busy reading stuff from stdin while gzip is busy writing to stdout. If gzip outputs faster than mysql can consume, gzip will end up filling the pipe's buffer and its next write will block, pausing it momentarily until mysql can catch up.

If we can inspect what position gzip is currently at, we can use that data along with the input file size to give us a progress indicator.

In linux 2.6.22 and beyond, there is /proc/<pid>/fdinfo which will tell you the current seek position of any open file in a process, so let's use that. First we'll need to find what file descriptor number is the input file, then ask what position it is at.

% pgrep -f 'gzip -dc admin'
6261
% PID=6261
% cd /proc/$PID/
% ls -l fd
total 0
lrwx------ 1 jsissel jsissel 64 May 11 15:04 0 -> /dev/pts/12
l-wx------ 1 jsissel jsissel 64 May 11 15:04 1 -> pipe:[31912237]
lrwx------ 1 jsissel jsissel 64 May 11 15:04 2 -> /dev/pts/12
lr-x------ 1 jsissel jsissel 64 May 11 15:04 3 -> /home/jsissel/admin.sql.20100511.060001.gz 
% cat fdinfo/3
pos:    149028864
flags:  0104000

# Get the file size
% size=$(stat -L fd/3 -c "%s")

# Get the position
% pos=$(awk '/pos:/ { print $2 }' fdinfo/3)

% echo "$pos/$size" | bc -l
.25203137570698241973
The above output says we are 25% complete.

You could go a step further and include the process uptime to show an estimate of time remaining.

cd /proc/$PID
size=$(stat -L fd/3 -c "%s")
pos=$(awk '/pos:/ { print $2 }' fdinfo/3)
start=$(date -d "$(ps -p $PID -o lstart=)" +%s)
now=$(date +%s)
echo "Minutes elapsed:"
echo "($now - $start) / 60" | bc -l
echo "Minutes remaining (estimated):"
echo "((($now - $start) / ($pos / $size)) - ($now - $start)) / 60" | bc -l

# Output
Minutes elapsed:
55.81666666666666666666
Minutes remaining (estimated):
110.58584668301847161105
Update: Some commenters pointed out pv as a solution here. It looks pretty good.

5 responses to 'gzip and other file progress checking'

Showing last 5 comments... (Click here to view all comments)

Mike wrote at Tue May 11 15:49:27 2010...
It's really funny you should mention this. I was running in a backup of our db onto a new offsite replicate today and used a similar method to get the elapsed time.

Been meaning to find some time to script it and make it a bit easier so I can use `watch` and let it go.

Scott wrote at Tue May 11 16:19:04 2010...
Or, you could just use pipe viewer http://www.ivarch.com/programs/pv.shtml

Twirrim wrote at Tue May 11 17:02:11 2010...
pv saves a lot of hassle, gives you nice fancy animated progress bars with ETAs too.

From a straight .sql file it's a one step process:

pv source.sql | mysql -u user -p database

or with gzip'd files:

pv source.sql.gz | gunzip - | mysql -u user -p database

Twirrim wrote at Tue May 11 17:05:46 2010...
Hmm.. missed half my comment there.

It's worth pointing out that any ETA produced is going to be inaccurate.  MySQL database dumps consist of a series of single line massive inserts.  One line might take 10 seconds, another 10 minutes.  There are so many variables that impact the speed with which inserts can take place that it's impossible to accurately predict how long remains.  Progress bars are useless too in that respect, the last 10% could take 90% of the time!

Jordan Sissel wrote at Tue May 11 17:38:24 2010...
Didn't know about pv. Thanks for the tip :)


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional, if you want me to be able to email you back)
URL (also optional)
Comment: