Both FreeBSD's and Linux's (GNU, really) split(1) tool is fairly slow when you want to split by line count. Splitting by byte count is (should be) faster, but that will break logs because the byte boundary probably won't ever align on a log boundary..
I'd prefer both: the speed of byte size splitting and the sanity of line based splitting. To that end, I wrote a small tool to do just that. Yes, there's probably a tool already available that does exactly what I want, but this was only 100 lines of C, so it was quick to write. GNU split's --line-bytes option is mostly what I want, but it's still crap slow.
Here's a comparison between my tool and gnu's split, run on the fastest workstation I have access to. My tool runs 4 times faster than gnu split for this task.
# Source file is 382 megs, which is tiny. % du -hs access 382M access # (Fast) Split into approximately 20meg chunks while preserving lines. % time ./fastsplit -b 20971520 -p ./split.fast. access real 0m1.260s user 0m0.018s sys 0m1.242s # (GNU) Split into 87000-line chunks, no guarantee on size. % time split -l 87000 access split.normal.. real 0m4.943s user 0m0.395s sys 0m2.440s # (GNU) Split into 20mb (max) chunks, preserve lines. % time split --line-bytes 20971520 access split.normal_bytes. real 0m4.391s user 0m0.001s sys 0m1.779sYou can see that the actual 'system' time is somewhat close (mine wins by 0.4s), but 'real' time is much longer for Linux's split(1).. My solution is really good if you want quickly split logs for parallel processing and you don't really care how many lines there are so much as you get near N-sized chunks.
What's the output look like?
| fast split | gnu split -l | gnu split --line-bytes |
|---|---|---|
% wc -l split.fast.* 86140 split.fast.00000 81143 split.fast.00001 92725 split.fast.00002 ... 91067 split.fast.00016 86308 split.fast.00017 84533 split.fast.00018 1654604 total |
% wc -l split.normal.* 87000 split.normal.aa 87000 split.normal.ab 87000 split.normal.ac ... 87000 split.normal.ar 87000 split.normal.as 1604 split.normal.at 1654604 total |
% wc -l split.normal_bytes.* 85973 split.normal_bytes.aa 80791 split.normal_bytes.ab 92363 split.normal_bytes.ac ... 86141 split.normal_bytes.ar 85665 split.normal_bytes.as 3999 split.normal_bytes.at 1654604 total |
% du -hs split.fast.* 21M split.fast.00000 21M split.fast.00001 21M split.fast.00002 ... 21M split.fast.00016 21M split.fast.00017 20M split.fast.00018 |
% du -hs split.normal.* 21M split.normal.aa 22M split.normal.ab 19M split.normal.ac ... 21M split.normal.ar 21M split.normal.as 352K split.normal.at |
% du -hs split.normal_bytes.* 21M split.normal_bytes.aa 21M split.normal_bytes.ab 21M split.normal_bytes.ac ... 21M split.normal_bytes.ar 21M split.normal_bytes.as 896K split.normal_bytes.at |