Referrer spam – or by Apache’s original misspelling, referer spam – has been a problem at snarfed.org for years. I use Summary for web analytics, and I made its statistic pages publicly available for a while, so spammers hit this site with fake referrers, hoping that they’ll be linked from the Summary pages.
There are a number of approaches to fighting referrer spam, but so far, no silver bullet. Here’s what I did while my Summary output was online.
I maintained a hand-edited blacklist of known spammers’ sites. It’s far from ideal, but it worked. You can find my blacklist in my summary.conf file. If you’re fighting referrer spam, feel free to borrow it. (I used to use webalizer; my webalizer.conf is also available, but it’s not maintained.)
I used to use iptables to blacklist known spammers’ IP addresses and subnets, such as marketscore.com, bezeqint.net, and ac.at.
I’d eventually like to use existing blacklist and referrer spam tools, such as Jay Allen’s MT-Blacklist. However, most of those tools are specific to MovableType and Apache, and this site uses SnipSnap instead.
A better approach would be to write tools that operate directly on referrer logs, so they can be used with any web server or CMS and any web analytics package. Tony Buser‘s derefspam.pl script is a good first step. I’d like to extend it to use DNSBLs and RBLs like Spamhaus, BSB, Blitzed, and SURBL.