Search this site


Metadata

Articles

Projects

Presentations

Comment spam that got through

I get emails from this site when someone comments.

This morning, this showed up:

Name: Virtual Pharmacy
Email: [snipped]
URL: [snipped]
Hostname: 114.199.36.72.reverse.layeredtech.com (72.36.199.114)
Entry URL: http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2
Comment location: [snipped]

Everyone repeat, what alcohol should be consumed moderately, but what it means? Why to women
 recommend to drink more moderately than to men? What is the female alcoholism? WBR LeoP
A quick google search for the strange tail token, "WBR LeoP" reveals a clear indication that this is comment spam (as if the content didn't give it away).

The url the spammer used points at pharmacynewsblog.com, which looks like a normal blog.

It's not.

The content is entirely viagra-and-friends related, which is fine. However, examine a simple visible text snippet of the following (this is from the frontpage):

Drug treatment may beat psychotherapy at ...
Google for this phrase and you'll find that it's been plagiarized. But deliciously so:

View source, you'll see:

<p>Drug <b class=ne>joint pain are </b>treatment <BLINK class=ne>of
purchase </BLINK>may <sup class=ne>wellbutrin at </sup>beat <small
class=ne>and paxil vs </small>psychotherapy
The css class 'ne' sets 'display: none' among other properties that make it stay out of the way of the browser.

This is quite clever, and appears automated.

pharmacynewsblog.com seems to be a somewhat autogenerated spam blog that takes news postings about viagara and the like and injects random html into it, with the intention of defeating antispam solutions. Anti-spam engines probably aren't smart enough to know that it should ignore the text pieces that are invisible. Who knows.

But, back to the spam comment. I use javascript to poke parts of the comment form indicating that a javascript-capable browser was used to submit the comment. If javascript is not detected, the comment is denied.

This comment got through, which means that javascript was enabled, which means that it was probably a webbrowser that did it.

Here's the apache log snippet:

72.36.199.114 - - [29/Jan/2007:13:01:17 -0500] "GET /blog/geekery/barcamp-sanfrancisco-2.html HTTP/1.1" 200 15903 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:18 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:20 -0500] "POST /blog/geekery/barcamp-sanfrancisco-2 HTTP/1.1" 200 16392 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:21 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
It didn't fetch any images, but it did pull style sheets, which is strange behavior if it's a simple spam bot that doesn't care about how a page looks. It also pulled the blog posting page first, then submitted a comment. Further indication that this bot is either really clever, or a person is behind the wheel.

If you search for the ip, 72.36.199.114, the first hit on google is an automagically updated list of known comment spam hosts.