photo
Jordan Sissel
geek

Mon, 29 Jan 2007

Comment spam that got through

I get emails from this site when someone comments.

This morning, this showed up:

Name: Virtual Pharmacy
Email: [snipped]
URL: [snipped]
Hostname: 114.199.36.72.reverse.layeredtech.com (72.36.199.114)
Entry URL: http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2
Comment location: [snipped]

Everyone repeat, what alcohol should be consumed moderately, but what it means? Why to women
 recommend to drink more moderately than to men? What is the female alcoholism? WBR LeoP
A quick google search for the strange tail token, "WBR LeoP" reveals a clear indication that this is comment spam (as if the content didn't give it away).

The url the spammer used points at pharmacynewsblog.com, which looks like a normal blog.

It's not.

The content is entirely viagra-and-friends related, which is fine. However, examine a simple visible text snippet of the following (this is from the frontpage):

Drug treatment may beat psychotherapy at ...
Google for this phrase and you'll find that it's been plagiarized. But deliciously so:

View source, you'll see:

<p>Drug <b class=ne>joint pain are </b>treatment <BLINK class=ne>of
purchase </BLINK>may <sup class=ne>wellbutrin at </sup>beat <small
class=ne>and paxil vs </small>psychotherapy
The css class 'ne' sets 'display: none' among other properties that make it stay out of the way of the browser.

This is quite clever, and appears automated.

pharmacynewsblog.com seems to be a somewhat autogenerated spam blog that takes news postings about viagara and the like and injects random html into it, with the intention of defeating antispam solutions. Anti-spam engines probably aren't smart enough to know that it should ignore the text pieces that are invisible. Who knows.

But, back to the spam comment. I use javascript to poke parts of the comment form indicating that a javascript-capable browser was used to submit the comment. If javascript is not detected, the comment is denied.

This comment got through, which means that javascript was enabled, which means that it was probably a webbrowser that did it.

Here's the apache log snippet:

72.36.199.114 - - [29/Jan/2007:13:01:17 -0500] "GET /blog/geekery/barcamp-sanfrancisco-2.html HTTP/1.1" 200 15903 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:18 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:20 -0500] "POST /blog/geekery/barcamp-sanfrancisco-2 HTTP/1.1" 200 16392 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:21 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
It didn't fetch any images, but it did pull style sheets, which is strange behavior if it's a simple spam bot that doesn't care about how a page looks. It also pulled the blog posting page first, then submitted a comment. Further indication that this bot is either really clever, or a person is behind the wheel.

If you search for the ip, 72.36.199.114, the first hit on google is an automagically updated list of known comment spam hosts.

Comments: 3 (view comments)
Tags: , ,
Permalink: /geekery/comment-spam-got-through
posted at: 13:41

Sat, 02 Sep 2006

Squid + Selenium == Win

Selenium cannot be used to test remote sites becuase browsers have cross-site scripting protection which prevents you from modifying the content from other domains. What happens when we fool the browser into believing content comes from a given domain? An extremely simple squid configuration can do just this.

I wrote one test. This test visits technorati's homepage, searches for 'barcamp' and verifies that a text string exists on the search results.

Steps:

  1. Tell Firefox to use my squid proxy as an HTTP proxy
  2. Visit http://www.technorati.com/_selenium/
  3. Run the tests
Simple, right?

The squid proxy intercepts any requests for '/_selenium' and redirects them internally to my selenium web server. This has been tested in IE, Firefox, and Safari with 100% success over vanilla http. HTTPS probably doesn't work for obvious "Duh, it's encrypted" reasons. Squid can fix this aswell with ssl reverse proxying.

If I run my single test, the result is something that looks like the following (Firefox and IE, respectively):

Comments: 0 (view comments)
Tags: , , ,
Permalink: /geekery/squid-selenium-dance-party
posted at: 03:39

Tue, 11 Jul 2006

Pyblosxom single-entry page title plugin

The page titles pyblosxom provides are usually great. However, when there is only one entry displayed, I feel it would be better to rely on that entry's title.

I wrote a very short plugin to do just that. Turns out the plugin api for pyblosxom is quite easy to understand, and this hack was only about 10 lines.

pagetitle.py adds a new variable which will contain the standard page title, unless there is only one entry in view. If there is only one entry in view, the page title is augmented with the story title aswell. This makes search engine results and browsers happier, as they can recognize what your page is about by the title. User experience good, also good for search engines.

The new variable you want to use is: $blog_title_or_entry_title

If you want to get a better idea of what this plugin does, you can click the permalink below to view only this entry. The page title (in the url bar) should now reflect this entry's title.

download pagetitle.py

Comments: 1 (view comments)
Tags: , , , , ,
Permalink: /geekery/pyblosxom-pagetitle-plugin
posted at: 03:12

Sat, 06 May 2006

The CSH Bawls Programming Competition

Yesterday, I participated in a 12-hour coding-binge competition. It started at 7pm Friday night and ran until 7am Saturday morning. It was fueled by Computer Science House and Bawls, both sponsors of the event. Needless to say, I haven't gotten much sleep today.

The competition website is here. Go there if you want to view this year's objectives.

The Dream Team consisted of John Resig, Darrin Mann, Matt Bruce, and myself. Darrin, Resig, and I are all quite proficient at web development, so we decided this year we would represent ourselves as "Team JavaScript" - and do everything possible in javascript. Bruce is not a programmer, but I enlisted his graphical art skills because I figured with our team doing some web-based project, we definitely needed an artist.

After reviewing all the objectives, we came up with a significant modification upon the Sudoku objective. The sudoku objective was a problem that lacked much room for innovation, so we went further and instead of solving Sudoku, wrote a web-based version of an extremely popular game in Second Life. The contest organizer approved of our new objective, so we did just that.

Resig worked on game logic, I worked on chat features, Darrin worked on scoring and game generation, and Bruce worked on the interface graphics. Becuase our tasks were all mostly unrelated, we could develop them independently. Most of the game was completed in about 6 hours, and the remainder of the time was spent fixing bugs, refactoring, and some minor redesign.

The backends were minimal. The chat backend was only 70 lines of perl, and the score backend was 9 lines of /bin/sh. Everything else was handled in the browser. We leveraged Resig's jQuery to make development faster. Development went extremely smooth, a testament to the "Dream Team"-nature of our team, perhaps? ;)

The game worked by presenting everyone with the same game - so you can compete for the highest score. You could also chat during and between games, if you wanted to.

A screenshot can be found here. At the end of the competition, we only had one known bug left. That bug didn't affect gameplay, and we were all tired, so it didn't get fixed. There were a few other issues that remained unresolved that may or may not be related to our code. Firefox was having issues with various things we were doing, and we couldn't tell if it was our fault or not.

Despite the fact that I probably shouldn't have attended the competition due to scholastic time constraints, I was glad I went. We had a blast writing the game.

We may get some time in the near future to improve the codebase and put it up online so anyone can play. There are quite a few important features that need to be added before it'll be useful as a public game.

Comments: 1 (view comments)
Tags: , , , , , , , ,
Permalink: /geekery/bawls-competition-tringo
posted at: 19:11

Sat, 08 Apr 2006

Xvfb + Firefox

Resig has a bunch of unit tests he does to make sure jQuery works properly on whatever browser. Manually running and checking unit test results is annoying and time consuming. Let's automate this.

Combine something simple like Firefox and Xvfb (X Virtual Frame Buffer), and you've got a simple way to run Firefox without a visible display.

Let's start Xvfb:

startx -- `which Xvfb` :1 -screen 0 1024x768x24
This starts Xvfb running on :1 with a screen size of 1024x768 and 24bits/pixel color depth. Now, let's run firefox:
DISPLAY=:1 firefox

# Or, if you run csh or tcsh
env DISPLAY=:1 firefox
Seems simple enough. What now? We want to tell firefox to go to google.com, perhaps.
DISPLAY=:1 firefox-remote http://www.google.com/
Now, let's take a screenshot (requires ImageMagick's import command):
DISPLAY=:1 import -window root googledotcom.png
Lets see what that looks like: googledotcom.png

While this isn't complicated, we could VERY EASILY automate lots of magic using something like the Selenium extension, all without requiring the use of a visual display (Monitor). Hopefully I'll find time to work on something cool using this soon.

Problems with screen scraping and other website interaction automation is that it almost always needs to be done without a browser. For instance, all of my screen scraping adventures have been using Perl. Browsers already know how to speak to the web, so why reinvent the wheel?

Firefox has lots of javascript-magic extensions such as greasemonkey and Selenium to let you execute browser-side javascript and activity automatically. Combine these together with Xvfb, and you can automate lots of things behind the scenes.

Tie this back to unit tests. Instead of simply displaying results of unit tests, have the page also report the results to a cgi script on the webserver. This will let you automatically test websites using a web browser and have it automatically report the results back to a server.

Comments: 13 (view comments)
Tags: , , , ,
Permalink: /geekery/xvfb-firefox
posted at: 03:53

Sun, 08 Jan 2006

xulrunner

I've been working on various touchscreen-related projects off-and-on and I've always wanted to use Firefox (Gecko, really) as the primary interface. HTML and JavaScript are quick and easy to hack together to provide a simple interface for touching and tapping goodness. However, the only touchscreen-based system I have access to has a 133mHz processor on it and Firefox takes about 15 minutes to start up.

I knew about the GRE (Gecko Runtime Environment) but I've never bothered learning about Mozilla's Embedded API to write my own application. Someone at Mozilla decided to do just that and more. I found this project today, called XULRunner.

XULRunner is a fantastic application that provides Gecko, XUL, JavaScript, et al; all without a fancy browser wrapping it. This is, in my opinion, perfect for embedded applications that may need/want web-browsing capability.

Applications for XULRunner are devilishly easy to write if you know XUL or have a server to serve webpages. A XULRunner tutorial mentions having to set a "home page" for your XULRunner application, it uses this:

pref("toolkit.defaultChromeURI", 
     "chrome://applicationName/content/startPage.xul");
I didn't want to write a full XUL application. I wanted to use my existing HTML/JavaScript-based web kiosk pages. So, what do I use?
pref("toolkit.defaultChromeURI",
     "http://www.csh.rit.edu/~psionic/projects/kioskweb/demo/");
Start XULRunner, and a window pops up with my webpage in it. Fantastic!

I haven't had a chance to test XULRunner on the touchscreen system yet, but seeing as how it lacks much of the code that makes firefox slightly bulky I'm hoping it will startup and run much faster. We'll see soon!

If you're looking to write a web-based application that uses XUL or even simply HTML+JavaScript, give XULRunner a look.

Comments: 0 (view comments)
Tags: ,
Permalink: /web/205
posted at: 13:14

Fri, 23 Dec 2005

javascript animation

LINKS IN THIS DO NOT WORK

I've made some more cool changes to my javascript effects library thing. You can see a demo of the two new effects here:

http://kenya.csh.rit.edu/static/test.html

Pimp's web interface is coming along smoothly. There are so many layers to this application it hurts!The server is python with json, xmlrpc, sqlite, and a few other pieces. The client is xhtml with javascript and css. There are so many places something can go wrong, whee!

I'll probably push my javascript stuff to this site as soon as it's finished. In the meantime, you can always look through my subversion repo.

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/200
posted at: 07:22

Tue, 13 Dec 2005

templating with xhtml, xpath, and python

I've been gradually researching interesting ways to go about templating pages for Pimp 4.0 (rewrite in python). I've come to the conclusion that regexp replacement is hackish. Using a big templating toolkit is too much effort for now. However, I've come up with a solution I've yet to test thorougly, but the gist of it is:

Use an XML DOM parser to get a DOM-ified version of the webpage. Use XPath to find elements I want to modify and do so as necessary. Poof, templating. A sample template is layout.html

The following python will parse it and insert "Testing" into the content div.

#!/usr/local/bin/python

import sys
from xml.dom import minidom
from xml import xpath

if __name__ == '__main__':
        foo = minidom.parse("layout.html")

        # Append a text node to the element with 'id="content"'
        div = xpath.Evaluate("//*[@id='content']", foo.documentElement)
        div[0].appendChild(foo.createTextNode("Testing"))
        foo.writexml(sys.stdout)
It seems pretty simple. I'm probably going to come up with a simple-ish xml/xpath way of doing templating. We'll see how well it actually works later on, but for now it seems like a pretty simple way of doing templating. Move the complicated parts (complex xpath notions) to a templating class with an "insert text" or somesuch method and poof, simple templating. Even for complex situations where I may need to produce a table it is easy to provide a default node-tree for replicating. The particular DOM implementation I am using provides me a wonderful cloneNode() method with which to do this.

Ofcourse, if you know of any other simpler ways of doing templating in python (or in general) definitely let me know :)

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/198
posted at: 03:35

Tue, 25 Oct 2005

xmlrpc javascript library and pimp rewrite

Yet Another Rewrite of Pimp, my music jukebox software, has commenced. This time, I'm writing it in Python. This was the best excuse I could find to learn python. I've tinkered with it before but never written an application in it.

Anyway, the interface has moved from telnet-based to web-based and uses XMLHTTPRequest (AJAX) to perform XMLRPC calls on the purely-python webserver. Python provides a wonderful standard module called 'xmlrpclib' to marshall/unmarshall XMLRPC requests and responses to/from python and XML. JavaScript, howver, lacks these marshalling features.

Some quick googling found jsolait and vcXMLRPC. Both of these are huge frameworks and are well beyond my particular needs. BOTH of them have "the suck" and fail to cleanly load into Firefox without warnings. Bah! Back at square-one. I'm left without a way to marshall xmlrpc requests and responses between javascript and xml

I spent some time learning about XMLRPC. Turns out it's a very very simple xml-based protocol for calling methods and getting results. JavaScript has DOM already so parsing XMLRPC messages is very easy.

Take a look at the 'rpcparam2hash' and 'hash2rpcparam' functions in pimp.js and see how I convert between JavaScript hashes (dictionaries) and XMLRPC messages. If I get bored I may create my own xmlrpc library specifically for making xmlrpc calls with javascript. If you want this to get done, please let me know and give me encouragement ;)

Comments: 1 (view comments)
Tags: , , , ,
Permalink: /geekery/193
posted at: 02:51

Thu, 26 May 2005

xml, xslt, and kioskweb!

I put some more work into my kiosk interface today. I made the keyboard widget highly pluggable, such that you can drop one anywhere on a page. The particular place I wanted to try this first was on the Drink machine login page.

projects/kioskweb/demo/drink.cgi?login

If you do a 'view source' on that page, you'll see that it looks somewhat like html, but there's this little widget tag that you shouldn't recognize. An xslt sheet turns that tag into something more useful - Look in your dom inspector for the actual result. This shows you how I'm somewhat planning on building this web-based kiosk interfacing system.

The end result will be that you can write your pages in psuedo XHTML and drop in fully featured widgets with simple tags like the widget tag. I currently support two forms of input (xml-wise) - those are XHTML with slight modifications and something I came up with that's less html-oriented. An example of this can be seen in this directory: projects/kioskweb/demo/xml

The entire interface is in xml, any html pages you may load are actually static html pages generated from xml. If you want to take a look at my xslt sheet, then click here. Opera 8 does not appear to support doing xslt client-side, so if you are using opera the pages won't render properly if at all.

This project is going to be all over xml/xslt like a donkey on a waffle.

Comments: 2 (view comments)
Tags: , , ,
Permalink: /web/171
posted at: 00:15

Search this site

Navigation

Page 1 of 2  [next]

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< January 2007 >
SuMoTuWeThFrSa
  1 2 3 4 5 6
7 8 910111213
14151617181920
21222324252627
28293031   

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati