I get emails from this site when someone comments.
This morning, this showed up:
Name: Virtual Pharmacy
Email: [snipped]
URL: [snipped]
Hostname: 114.199.36.72.reverse.layeredtech.com (72.36.199.114)
Entry URL: http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2
Comment location: [snipped]
Everyone repeat, what alcohol should be consumed moderately, but what it means? Why to women
recommend to drink more moderately than to men? What is the female alcoholism? WBR LeoP
A quick google search for the strange tail token, "WBR LeoP" reveals a
clear indication that this is comment spam (as if the content didn't give it
away).
The url the spammer used points at pharmacynewsblog.com, which looks like a normal blog.
It's not.
The content is entirely viagra-and-friends related, which is fine. However, examine a simple visible text snippet of the following (this is from the frontpage):
Drug treatment may beat psychotherapy at ...
Google for this phrase and you'll find that it's been plagiarized. But deliciously so:
View source, you'll see:
<p>Drug <b class=ne>joint pain are </b>treatment <BLINK class=ne>of
purchase </BLINK>may <sup class=ne>wellbutrin at </sup>beat <small
class=ne>and paxil vs </small>psychotherapy
The css class 'ne' sets 'display: none' among other properties that make it
stay out of the way of the browser.
This is quite clever, and appears automated.
pharmacynewsblog.com seems to be a somewhat autogenerated spam blog that
takes news postings about viagara and the like and injects random html into it,
with the intention of defeating antispam solutions. Anti-spam engines probably
aren't smart enough to know that it should ignore the text pieces that are
invisible. Who knows.
But, back to the spam comment. I use javascript to poke parts of the comment
form indicating that a javascript-capable browser was used to submit the
comment. If javascript is not detected, the comment is denied.
This comment got through, which means that javascript was enabled, which means
that it was probably a webbrowser that did it.
Here's the apache log snippet:
72.36.199.114 - - [29/Jan/2007:13:01:17 -0500] "GET /blog/geekery/barcamp-sanfrancisco-2.html HTTP/1.1" 200 15903 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:18 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:20 -0500] "POST /blog/geekery/barcamp-sanfrancisco-2 HTTP/1.1" 200 16392 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:21 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
It didn't fetch any images, but it did pull style sheets, which is strange
behavior if it's a simple spam bot that doesn't care about how a page looks. It
also pulled the blog posting page first, then submitted a comment. Further
indication that this bot is either really clever, or a person is behind the
wheel.
If you search for the ip, 72.36.199.114, the first hit on google is an automagically updated list of
known comment spam hosts.
Comments: 3 (view comments)
Tags: spam, site, web
Permalink: /geekery/comment-spam-got-through
posted at: 13:41
Selenium cannot be used to test remote sites becuase browsers have cross-site
scripting protection which prevents you from modifying the content from other
domains. What happens when we fool the browser into believing content comes from a given domain? An extremely simple squid configuration can do just this.
I wrote one test. This test visits technorati's homepage, searches for
'barcamp' and verifies that a text string exists on the search results.
Steps:
- Tell Firefox to use my squid proxy as an HTTP proxy
- Visit http://www.technorati.com/_selenium/
- Run the tests
Simple, right?
The squid proxy intercepts any requests for '/_selenium' and redirects them
internally to my selenium web server. This has been tested in IE, Firefox, and
Safari with 100% success over vanilla http. HTTPS probably doesn't work for
obvious "Duh, it's encrypted" reasons. Squid can fix this aswell with ssl
reverse proxying.
If I run my single test, the result is something that looks like the following
(Firefox and IE, respectively):
Comments: 0 (view comments)
Tags: selenium, squid, web, testing
Permalink: /geekery/squid-selenium-dance-party
posted at: 03:39
The page titles pyblosxom provides are usually great. However, when there is
only one entry displayed, I feel it would be better to rely on that entry's
title.
I wrote a very short plugin to do just that. Turns out the plugin api for
pyblosxom is quite easy to understand, and this hack was only about 10 lines.
pagetitle.py adds a new variable which will contain the standard
page title, unless there is only one entry in view. If there is only one entry
in view, the page title is augmented with the story title aswell. This makes
search engine results and browsers happier, as they can recognize what your
page is about by the title. User experience good, also good for search engines.
The new variable you want to use is:
$blog_title_or_entry_title
If you want to get a better idea of what this plugin does, you can click the
permalink below to view only this entry. The page title (in the url bar) should
now reflect this entry's title.
download pagetitle.py
Comments: 1 (view comments)
Tags: pyblosxom, hacks, late-night-hacking, python, web, site
Permalink: /geekery/pyblosxom-pagetitle-plugin
posted at: 03:12
Yesterday, I participated in a 12-hour coding-binge competition. It started at
7pm Friday night and ran until 7am Saturday morning. It was fueled by Computer
Science House and Bawls, both sponsors of the event. Needless to say, I haven't
gotten much sleep today.
The competition website is here. Go there if you
want to view this year's objectives.
The Dream Team consisted of John Resig, Darrin
Mann, Matt Bruce, and myself. Darrin, Resig, and I are all quite proficient at
web development, so we decided this year we would represent ourselves as "Team
JavaScript" - and do everything possible in javascript. Bruce is not a
programmer, but I enlisted his graphical art skills because I figured with our
team doing some web-based project, we definitely needed an artist.
After reviewing all the objectives, we came up with a significant modification
upon the Sudoku objective. The sudoku objective was a problem that lacked much
room for innovation, so we went further and instead of solving Sudoku, wrote a
web-based version of an extremely popular game in Second Life. The contest
organizer approved of our new objective, so we did just that.
Resig worked on game logic, I worked on chat features, Darrin worked on scoring
and game generation, and Bruce worked on the interface graphics. Becuase our
tasks were all mostly unrelated, we could develop them independently. Most of
the game was completed in about 6 hours, and the remainder of the time was
spent fixing bugs, refactoring, and some minor redesign.
The backends were minimal. The chat backend was only 70 lines of perl, and the
score backend was 9 lines of /bin/sh. Everything else was handled in the
browser. We leveraged Resig's jQuery to make development faster. Development
went extremely smooth, a testament to the "Dream Team"-nature of our team,
perhaps? ;)
The game worked by presenting everyone with the same game - so you can compete
for the highest score. You could also chat during and between games, if you
wanted to.
A screenshot can be found here. At the end of the competition, we only had one
known bug left. That bug didn't affect gameplay, and we were all tired, so it
didn't get fixed. There were a few other issues that remained unresolved that
may or may not be related to our code. Firefox was having issues with various
things we were doing, and we couldn't tell if it was our fault or not.
Despite the fact that I probably shouldn't have attended the competition due to
scholastic time constraints, I was glad I went. We had a blast writing the game.
We may get some time in the near future to improve the codebase and put it up
online so anyone can play. There are quite a few important features that need to
be added before it'll be useful as a public game.
Comments: 1 (view comments)
Tags: nosleep, perl, javascript, web2.0, jquery, shell, web, xml, codebinge
Permalink: /geekery/bawls-competition-tringo
posted at: 19:11
Resig has a bunch of unit tests he does to make sure jQuery works properly on whatever browser. Manually running and checking unit test results is annoying and time consuming. Let's automate this.
Combine something simple like Firefox and Xvfb (X Virtual Frame Buffer), and you've got a simple way to run Firefox without a visible display.
Let's start Xvfb:
startx -- `which Xvfb` :1 -screen 0 1024x768x24
This starts Xvfb running on :1 with a screen size of 1024x768 and 24bits/pixel color depth. Now, let's run firefox:
DISPLAY=:1 firefox
# Or, if you run csh or tcsh
env DISPLAY=:1 firefox
Seems simple enough. What now? We want to tell firefox to go to google.com, perhaps.
DISPLAY=:1 firefox-remote http://www.google.com/
Now, let's take a screenshot (requires ImageMagick's import command):
DISPLAY=:1 import -window root googledotcom.png
Lets see what that looks like: googledotcom.png
While this isn't complicated, we could VERY EASILY automate lots of magic using
something like the Selenium extension, all without requiring the use of a
visual display (Monitor). Hopefully I'll find time to work on something cool
using this soon.
Problems with screen scraping and other website interaction automation is that
it almost always needs to be done without a browser. For instance, all of my
screen scraping adventures have been using Perl. Browsers already know how to
speak to the web, so why reinvent the wheel?
Firefox has lots of javascript-magic extensions such as greasemonkey and Selenium to let you execute
browser-side javascript and activity automatically. Combine these together with
Xvfb, and you can automate lots of things behind the scenes.
Tie this back to unit tests. Instead of simply displaying results of unit tests,
have the page also report the results to a cgi script on the webserver. This
will let you automatically test websites using a web browser and have it
automatically report the results back to a server.
Comments: 13 (view comments)
Tags: boredom-induced-research, web, firefox, X11, ideas
Permalink: /geekery/xvfb-firefox
posted at: 03:53
I've been working on various touchscreen-related projects off-and-on and I've
always wanted to use Firefox (Gecko, really) as the primary interface. HTML and
JavaScript are quick and easy to hack together to provide a simple interface
for touching and tapping goodness. However, the only touchscreen-based system I
have access to has a 133mHz processor on it and Firefox takes about 15 minutes
to start up.
I knew about the GRE (Gecko Runtime Environment) but I've never bothered learning about Mozilla's Embedded API to write my own application. Someone at Mozilla decided to do just that and more. I found this project today, called XULRunner.
XULRunner is a fantastic application that provides Gecko, XUL, JavaScript, et al; all without a fancy browser wrapping it. This is, in my opinion, perfect for embedded applications that may need/want web-browsing capability.
Applications for XULRunner are devilishly easy to write if you know XUL or have a server to serve webpages. A XULRunner tutorial mentions having to set a "home page" for your XULRunner application, it uses this:
pref("toolkit.defaultChromeURI",
"chrome://applicationName/content/startPage.xul");
I didn't want to write a full XUL application. I wanted to use my existing HTML/JavaScript-based web kiosk pages. So, what do I use?
pref("toolkit.defaultChromeURI",
"http://www.csh.rit.edu/~psionic/projects/kioskweb/demo/");
Start XULRunner, and a window pops up with my webpage in it. Fantastic!
I haven't had a chance to test XULRunner on the touchscreen system yet, but seeing as how it lacks much of the code that makes firefox slightly bulky I'm hoping it will startup and run much faster. We'll see soon!
If you're looking to write a web-based application that uses XUL or even simply HTML+JavaScript, give XULRunner a look.
Comments: 0 (view comments)
Tags: open source adventures, web
Permalink: /web/205
posted at: 13:14
LINKS IN THIS DO NOT WORK
I've made some more cool changes to my javascript effects library thing. You
can see a demo of the two new effects here:
http://kenya.csh.rit.edu/static/test.html
Pimp's web interface is coming along smoothly. There are so many layers to this application it hurts!The server is python with json, xmlrpc, sqlite, and a few other pieces. The client is xhtml with javascript and css. There are so many places something can go wrong, whee!
I'll probably push my javascript stuff to this site as soon as it's finished. In the meantime, you can always look through my subversion repo.
Comments: 0 (view comments)
Tags: javascript, web, PERMABROKEN
Permalink: /geekery/200
posted at: 07:22
I've been gradually researching interesting ways to go about templating pages for Pimp 4.0 (rewrite in python). I've come to the conclusion that regexp replacement is hackish. Using a big templating toolkit is too much effort for now. However, I've come up with a solution I've yet to test thorougly, but the gist of it is:
Use an XML DOM parser to get a DOM-ified version of the webpage. Use XPath to find elements I want to modify and do so as necessary. Poof, templating.
A sample template is layout.html
The following python will parse it and insert "Testing" into the content div.
#!/usr/local/bin/python
import sys
from xml.dom import minidom
from xml import xpath
if __name__ == '__main__':
foo = minidom.parse("layout.html")
# Append a text node to the element with 'id="content"'
div = xpath.Evaluate("//*[@id='content']", foo.documentElement)
div[0].appendChild(foo.createTextNode("Testing"))
foo.writexml(sys.stdout)
It seems pretty simple. I'm probably going to come up with a simple-ish xml/xpath way of doing templating. We'll see how well it actually works later on, but for now it seems like a pretty simple way of doing templating. Move the complicated parts (complex xpath notions) to a templating class with an "insert text" or somesuch method and poof, simple templating. Even for complex situations where I may need to produce a table it is easy to provide a default node-tree for replicating. The particular DOM implementation I am using provides me a wonderful cloneNode() method with which to do this.
Ofcourse, if you know of any other simpler ways of doing templating in python (or in general) definitely let me know :)
Comments: 0 (view comments)
Tags: xml, web, python
Permalink: /geekery/198
posted at: 03:35
Yet Another Rewrite of Pimp, my music jukebox software, has commenced. This
time, I'm writing it in Python. This was the best excuse I could find to learn
python. I've tinkered with it before but never written an application in it.
Anyway, the interface has moved from telnet-based to web-based and uses
XMLHTTPRequest (AJAX) to perform XMLRPC calls on the purely-python webserver.
Python provides a wonderful standard module called 'xmlrpclib' to
marshall/unmarshall XMLRPC requests and responses to/from python and XML.
JavaScript, howver, lacks these marshalling features.
Some quick googling found jsolait and vcXMLRPC. Both of these are huge
frameworks and are well beyond my particular needs. BOTH of them have "the suck" and
fail to cleanly load into Firefox without warnings. Bah! Back at square-one.
I'm left without a way to marshall xmlrpc requests and responses between
javascript and xml
I spent some time learning about XMLRPC. Turns out it's a very very simple
xml-based protocol for calling methods and getting results. JavaScript has DOM already so parsing XMLRPC messages is very easy.
Take a look at the 'rpcparam2hash' and 'hash2rpcparam' functions in pimp.js and see how I
convert between JavaScript hashes (dictionaries) and XMLRPC messages. If I get
bored I may create my own xmlrpc library specifically for making xmlrpc calls
with javascript. If you want this to get done, please let me know and give me
encouragement ;)
Comments: 1 (view comments)
Tags: javascript, xmlrpc, xml, web, python
Permalink: /geekery/193
posted at: 02:51
I put some more work into my kiosk interface today. I made the keyboard widget highly pluggable, such that you can drop one anywhere on a page. The particular place I wanted to try this first was on the Drink machine login page.
projects/kioskweb/demo/drink.cgi?login
If you do a 'view source' on that page, you'll see that it looks somewhat like html, but there's this little widget tag that you shouldn't recognize. An xslt sheet turns that tag into something more useful - Look in your dom inspector for the actual result. This shows you how I'm somewhat planning on building this web-based kiosk interfacing system.
The end result will be that you can write your pages in psuedo XHTML and drop in fully featured widgets with simple tags like the widget tag. I currently support two forms of input (xml-wise) - those are XHTML with slight modifications and something I came up with that's less html-oriented. An example of this can be seen in this directory: projects/kioskweb/demo/xml
The entire interface is in xml, any html pages you may load are actually static html pages generated from xml. If you want to take a look at my xslt sheet, then click here. Opera 8 does not appear to support doing xslt client-side, so if you are using opera the pages won't render properly if at all.
This project is going to be all over xml/xslt like a donkey on a waffle.
Comments: 2 (view comments)
Tags: xml, xslt, web, javascript
Permalink: /web/171
posted at: 00:15
|
Search this site
Navigation
Metadata
Home
About
Resume
My Code (SVN)
ARP Security
Dynamic DNS with DHCP
OpenLDAP+Kerberos+SASL
PPP over SSH
SSH Security: /bin/false
Week of Unix Tools
Work Efficiency
fex
firefox tabsearch
firefox urledit
grok
keynav
liboverride
newpsm (FreeBSD)
nis2ldap
pam_captcha
poor man's backup
Solaris audio utility
xboxproxy
xdotool
xmlpresenter
xpathtool
misc scripts
Presentations
Yahoo! Hack Day '06
Unix Essentials
Vi/Vim Essentials
Tag Cloud
Calendar
Friends
BarCamp
Kent Brewster
Tantek Çelik
John Resig
Wesley Shields
Tyler Shields
Technorati
|