Short version: xpath query tool for xml and html.
Long version: swanky frontend to xsltproc which takes an xpath query and content and spits out the results.
Dependencies: xsltproc (comes with libxslt), xmllint (comes with libxml2).
xpathtool-20071102.tar.gz
- --ihtml
- Set input format as html.
- --otext
- Output should be text. Implemented as <xsl:value-of select="." />
- --oxml
- Output should be xml.
- --ohtml
- Output should be html.
- --indent (default) or --noindent
- Set whether or not xml or html output should be depth-based indented.
- --stripspace=XXX
- Define elements who's content should be space-stripped. Implemented with <xsl:strips-ace>.
- --pretty (default) or --nopretty
- Pretty print xml and html output by filtering through 'xmllint --format'
% GET feeds.technorati.com/wtf | ./xpathtool.sh '//link' | tail -3
http://technorati.com/wtf/we-can-take-our-country-back/2007/05/16/ron-paul-is-standing-up-tot-the-establishment-1
http://technorati.com/wtf/giuliani-is-deluded/2007/05/16/delusional-and-out-of-touch-with-reality-1
http://technorati.com/wtf/macbook/2007/05/16/apples-rule-1
Slashdot is worthless. The article writeups are worthless. The comments are worthless. The users are worthless.
Sometimes, the linked content is not. Let's pull out all the links in all the articles on the frontpage:
# slashdot articles are inside the following html element
% xbase="//div[@class='article']//div[@class='intro']/i"
% GET www.slashdot.org | ./xpathtool.sh --ihtml "$xbase//a/@href|$xbase//a/text()" | paste -d" " - -
http://www.foreignpolicy.com/story/cms.php?story_id=3807 the world's biggest digital dump
http://googleblog.blogspot.com/2007/05/google-apps-partner-edition.html turn over their entire email operation to Google
http://apcmag.com/6138/the_dark_side_of_google_apps_for_isps the dark side of Google's offer
xmlns='http://www.w3.org/2005/Atom'
and I can't see any of the title elements, because they are not in the XSL default namespace.
This is probably me being dim because I'm learning XPATH/XSL, thanks in advance if you have an answer.