Referrer spam – or by Apache’s original misspelling, referer spam – was a problem at snarfed.org for years. I ~use~ used to use Summary for web analytics, and I made its statistic pages publicly available for a while, so spammers hit this site with fake referrers, hoping that they’ll be linked from the Summary pages.
There are a number of approaches to fighting referrer spam, but so far, no silver bullet. Here’s what I did while my Summary output was online.
I maintained a hand-edited blacklist of known spammers’ sites in my
summary.conffile. It’s far from ideal, but it worked. If you’re fighting referrer spam, feel free to borrow it. I used to use webalizer, and did the same in
I wrote a webalizer nofollow patch that adds support for the popular new nofollow tag.
For a bit, I used iptables to blacklist known spammers’ IP addresses and subnets, such as marketscore.com, bezeqint.net, and ac.at.
I liked existing blacklist and referrer spam tools, such as Jay Allen’s MT-Blacklist. However, most of those tools are specific to MovableType and Apache, and this site uses SnipSnap instead.
A better approach would be to write tools that operate directly on referrer logs, so they can be used with any web server or CMS and any web analytics package. Tony Buser‘s derefspam.pl script is a good first step. You could extend it to use DNSBLs and RBLs like Spamhaus, BSB, Blitzed, and SURBL.