I get emails from this site when someone comments.
This morning, this showed up:
Name: Virtual Pharmacy
Email: [snipped]
URL: [snipped]
Hostname: 114.199.36.72.reverse.layeredtech.com (72.36.199.114)
Entry URL: http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2
Comment location: [snipped]
Everyone repeat, what alcohol should be consumed moderately, but what it means? Why to women
recommend to drink more moderately than to men? What is the female alcoholism? WBR LeoP
A quick google search for the strange tail token,
"WBR LeoP" reveals a
clear indication that this is comment spam (as if the content didn't give it
away).
The url the spammer used points at pharmacynewsblog.com, which looks like a normal blog.
It's not.
The content is entirely viagra-and-friends related, which is fine. However, examine a simple visible text snippet of the following (this is from the frontpage):
Drug treatment may beat psychotherapy at ...
Google for this phrase and you'll find that it's been plagiarized. But deliciously so:
View source, you'll see:
<p>Drug <b class=ne>joint pain are </b>treatment <BLINK class=ne>of
purchase </BLINK>may <sup class=ne>wellbutrin at </sup>beat <small
class=ne>and paxil vs </small>psychotherapy
The css class 'ne' sets 'display: none' among other properties that make it
stay out of the way of the browser.
This is quite clever, and appears automated.
pharmacynewsblog.com seems to be a somewhat autogenerated spam blog that
takes news postings about viagara and the like and injects random html into it,
with the intention of defeating antispam solutions. Anti-spam engines probably
aren't smart enough to know that it should ignore the text pieces that are
invisible. Who knows.
But, back to the spam comment. I use javascript to poke parts of the comment
form indicating that a javascript-capable browser was used to submit the
comment. If javascript is not detected, the comment is denied.
This comment got through, which means that javascript was enabled, which means
that it was probably a webbrowser that did it.
Here's the apache log snippet:
72.36.199.114 - - [29/Jan/2007:13:01:17 -0500] "GET /blog/geekery/barcamp-sanfrancisco-2.html HTTP/1.1" 200 15903 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:18 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:20 -0500] "POST /blog/geekery/barcamp-sanfrancisco-2 HTTP/1.1" 200 16392 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:21 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
It didn't fetch any images, but it did pull style sheets, which is strange
behavior if it's a simple spam bot that doesn't care about how a page looks. It
also pulled the blog posting page first, then submitted a comment. Further
indication that this bot is either really clever, or a person is behind the
wheel.
If you search for the ip, 72.36.199.114, the first hit on google is an automagically updated list of
known comment spam hosts.