I am running three small private websites that I care about that use Google Analytics.
And every other day my reports get messed by some stupid referral spam:
Lots of entries from spam websites such as share-buttons.xyz, free-traffic.xyz, traffic2cash.xyz or с.новым.годом.рф shamelessly begging for traffic.
… I don’t know if this bothers you as much as it does me - but I am sick of it and decided to do something about it.
What can you do against Referrer Spam?
You cannot prevent the spammers from sending false analytics data to your Google Analytics accounts. But there are two things you can do to mitigate the effects of referral spam in your Google Analytics reports:
- Use filters to block the spam domains from your reports (proactive)
- Use custom segments to exclude spam in your reports (retroactive)
Even though these measures work – they still suck. Because using filters and segments to exclude spam from your analytics data means manual effort for something that shouldn’t be necessary. Especially since Google could do that for us. But for reasons unknown to me they don’t.
So we form a vigilante group to protect ourselves from the spammers 👊.
… and since we are developers we build a machine 👾 which does the fighting for us.
Introducing “Google Analytics Spam Control”
To keep my referrer spam filters up-to-date without having to update them myself I have created a tool which does that for me: ga-spam-control
ga-spam-control (as in “Google Analytics Spam Control”) uses lists of known referrer spam domain names to create and update regular Google Analytics filters which block these domain from your reports.
… unfortunately filters don’t take effect on existing spam entries. But future reports will no longer include records with domain names which were identified as referrer spam.
Animation: Using segments to filter spam that made it through the filters
ga-spam-control is a command-line utility for Linux, Mac OS and Windows. Its four most important commands are:
- filters status: Display your spam control status
- filters update: Update the spam filters of a given analytics account
- domains update: List all known referrer spam domains
- domain find: Find new referrer spam in your analytics data
For a full documentation of all available option use the
help action or have a look at the project documentation at github.com/andreaskoch/ga-spam-control.
When you use ga-spam-control for the first time you will be asked to authorize the application to access your Google Analytics data and filter controls:
Animation: oAuth authorization during first call to filters status
Your credentials are stored in
~/.ga-spam-control/credentials.json. For future actions, as long as your Google Analytics access token is valid, you will not be asked for authorization again.
Show your spam-control status
To show the current spam-control status of the Google Analytics accounts that you have access to you an use the
filters status action:
ga-spam-control filters status
Animation: ga-spam-control filters status
Because there are so many known referrer spam domains (currently more than 900) ga-spam-control cannot simply create a single filter. One Google Analytics filter can have a maximum of 255 characters – so ga-spam-control distributes all spam domains across multiple filters:
Animation: Google Analytics filters created by ga-spam-control
The percentage behind each account indicates how many of the known referrer spam domains are currently blocked by the existing spam filters of your Google Analytics accounts. When the percentage is
100% it means all known referrer spam domains are blocked.
90% means most spam domain names are being blocked. And
0% generally means that you don’t have any spam filters installed.
Update or install spam-control filters
To install (or update) the spam-control filters for a given account you can use the
filters update command:
ga-spam-control filters update <accountID>
Animation: Installing spam-control filters with ga-spam-control filters update
Update referrer spam domain list
To keep up with spammers you need to regularly update your spam domain list using the
ga-spam-control domains update
Animation: Updating the local spam domain lists with ga-spam-control domains update
Find new referrer spam domains
Besides the lists of referrer spam domain names that are maintained by the community you can also maintain your personal list of spam domains. In earlier version of ga-spam-control this was done by a machine-learning algorithm, but becase I could not get that to work reliably I made the spam detection process manual:
ga-spam-control domains find <accountID> <numberOfDaysToReview>
Animation: Locating new referrer spam in your anayltics data using ga-spam-control domains find
What is special about ga-spam-control?
There are already other tools which also create and maintain Google analytics filters for you. But there are a few points in which ga-spam-control is different:
- Unless most other tools, ga-spam-control is a command-line utility that can schedule to run automatically on any number of Google Analytics accounts. No manual interaction required.
- ga-spam-control is cross-platform and works on Linux, OS X and Windows.
- The source code of ga-spam-control is publicly available at github.com/andreaskoch/ga-spam-control, open for review and change requests.
- ga-spam-control uses multiple community referrer spam lists as a source for referrer spam domains.
- ga-spam-control makes it easy to find new referrer spam in your analytics data and add these domain names to your list of known referrer spam
How does ga-spam-control work?
ga-spam-control builds a list known referrer spam domains and creates and maintains Google Analytics View Filters for these domains:
ga-spam-control uses …
- OAuth 2.0 to access the Google Analytics API
- the Google Analytics Reports API for extracting analytics data for spam detection
- the Google Analytics View Filters API to create, update and delete Google Analytics filters which exclude known referrer spam domains from your analytics reports
- lists of known referrer spam domain names that are maintained by the community as a fallback to the machine-learning service for spam detection:
- your home directory (
~/.ga-spam-control) for storing your Google Analytics oAuth credentials and the list of known referrer spam domain names
What about Segments?
Every time a new referral spam domain appears that has not yet been detected by the community it will make its way into your Analytics reports before it can be identified by the machine-learning model of ga-spam-control.
After ga-spam-control identified the new spammer it can easily block it. But it cannot remove the existing spam entry from you analytics reports. This can only be done with segments:
Unfortunately Google Analytics Segments can only be created manually. So ga-spam-control currently can’t help you with that.
Using machine learning to detect referrer spam
Earlier versions of ga-spam-control contained a machine-learning component which tried to detect referrer spam by training a neural network.
Unfortunately I could not get it to work reliably enough to work with sites with different usage patterns, so I removed it. But I will build a machine learning based referrer spam detector that uses honeypot sites that will help to keep the referrer spam lists up-to-date.
Ideally Google would just include spam protection into Google Analytics and make this whole thing obsolete.
But until then I will use this tool as a playground and add some features here and then. The complete list of feature ideas is maintained in the README of the project: github.com/andreaskoch/ga-spam-control#roadmap
I will update this post when I release a new version of ga-spam-control (current version: v0.6.0).