Update: Unfortunately, as of February 2020 Google Analytics has deprecated "Service Provider" and "Network Domain".
So filter using these dimensions as a condition will no longer work, which is a pity because they were extremely useful to identify and filter Bot traffic. The rest of the filters listed below are still working so you can keep using them.
I'm working in a new and more comprehensive way of identifying and filtering bot traffic. Stay tuned!
Filtering junk traffic in Google Analytics: A comprehensive solution
Google Analytics is probably one of the most important elements of the decision-making process of your website. The success or failure of your efforts (SEO, ad campaigns, social media, content marketing, etc.) can be easily determined by the accuracy and value of your GA reports.
If you don't take the appropriate measures, unwanted data such as bots, internal traffic, and spam will decrease that accuracy and in some cases lead to poor decisions.
The good news is that GA has a powerful filter functionality, that if used well, will help you prevent all that junk traffic. The bad news is that, in my experience, most sites don't use it properly.
99% of the sites I audit either don't use filters at all or they do, but not correctly, which can create an even bigger problem.
So to help you get data that you can trust, I will show you:
- The most effective ways of filtering bots, spam, and other junk traffic in your Google Analytics,
- And also important; how to do it safely so you don't risk your real user data.
Dos and don'ts when filtering data in Google Analytics
Filters are a powerful tool if used in the right way. So let's go quickly through a list of things you should consider when filtering in GA.
- Wrong: Never use the referral exclusion list to filter out traffic; that list has a completely different purpose.
- Wrong: Don't handle spam individually, this is extremely inefficient and will become a nightmare to maintain,
- Wrong: Don't worry about the spam/bots data harming your SEO, GA data is not used for rankings in search results.
- Wrong: Server-side solutions like WordPress plugins or the .htaccess won't have any effect on Ghosts spam because that type of spam never passes through your server.
First things first. Protect your data from misconfigurations
Before creating any filter in Google Analytics should have at least 2 views, one were you will apply the filters and a second one that you will leave unfiltered, this will work as a backup and to check the progress of your filters. If you want to be extra cautious you can create a test view to test your filters before applying them.
Here you can find how to create and set best practices for views in Google Analytics.
5 types of filters to stop bots and spam in Google Analytics
Once your views are correctly configured, it's time to stop all of that dirty traffic that skews your reports and doesn't let you see the real performance of your site.
There is no one all mighty solution or checkbox that will stop all junk traffic at once, so if you want to have accurate Analytics you will have to work for it.
The Google Analytics filters you will need are:
- Campaign source filters for crawler referral spam,
- Valid hostname filter for ghost spam and DEV environments,
- Language filter for bots,
- Static and dynamic filters for internal traffic,
- Bonus: Enable the built-in feature "Bot Filtering" (to exclude a few known bots)
Do you want me to do this for you? I can review your analytics and apply all the necessary filters and fixes to ensure you are receiving the most accurate data possible and your analytics settings are optimal.
General notes about filters.
- Do not apply filters on the Raw data view,
- If you are not comfortable working with filters yet, you can use them in a test view first,
- Filters only work forward, for historical data you should use a segment.
a. Campaign Source filter to stop Crawler referral spam
To block crawler spam you'll need a filter with an expression that matches the campaign source of all crawler spam.
To save you some time, I've created a set of optimized regular expressions (REGEX) with all the relevant crawler spam detected over the last years, you'll find them below in the instructions.
How to create a filter to block crawler referrer spam in Google Analytics
To block referrer spam in Google Analytics you will need to create an exclude filter using the campaign source:
- Again go to the admin section of your GA.
- On the last column "VIEW", select Filters and then click + Add Filter
- Enter as a name for the filter "Exclude Source - Bots #"
- Configure the filter as follows:
- Filter Type select Custom > Exclude
- Filter Field select Campaign Source (don't use referral field or it won't work)
- Filter Pattern > Paste the following crawler referrer spam expression.
These expressions were re-built to optimize the number of filters. If you created your filter before September 28, 2019, replace all the old expressions and remove any extra filter.
Create 1 filter for each expression
Crawler Expression 1TOTAL CHARACTERS: 217semalt|ranksonic|timer4web|anticrawler|uptime(robot|bot|check|\-|\.com)|foxweber|:8888|xtraffic\.plus|(christopherblog|tammyblog|billyblog)\.online|traffic4free|bottraffic|easy-website\-traffic|bot4free|trafficbot
Crawler Expression 2TOTAL CHARACTERS: 249(axcus|dotmass|artstart|dorothea|artpress|matpre|ameblo|freeseo|jimto|seo-tips|hazblog|overblog|squarespace|ronaldblog|c\.g456|zz\.glgoo|harriett|webedu|barbarahome|verabauer|deirdre|ninacecillia|reginanahum|deniseconnie|firstblog|maxinesamson)\.top
Get free notifications with the updated expressions whenever I detect new threats.
- After everything is set Save.
Note: These are common sources. You can create an additional filter with the exact same configuration, if you find other referrals that are not useful for your Analytics, for example, mobile test sites, project management tools, monitoring services, or other spam is not listed.
b. Valid hostname filter to stop ghost spam and development environments
Nowadays ghost spam is less frequent than it used to be a couple of years ago, however, I still recommend to have it in place in case a new wave arrives. Also this filter will help you prevent useless traffic from dev sites and scrapers.
Here you will find detailed instructions on how to build a valid hostname filter.
c. Language filter for sneaky crawlers and bots
From time to time you may see weird languages showing in your analytics. I prepared an expression that will prevent any language that doesn't have a proper format like es-ES, en-US, fr-FR, etc.
I also added to the expression the "Language c" which seems to be left by bots.
- Create a new filter with the following settings:
- Filter name: Exclude Language - Bots
- Filter configuration:
- Filter type: Custom > Exclude
- Filter field: Language Settings
- Filter pattern: enter the following expression as it is:
d. Static and dynamic filters for internal traffic
Not all junk traffic in Analytics comes from outside your company. In fact, a lot can come from within your team: developers, testers, marketers, support, curious employees, etc.
This type of junk traffic is often overlooked and if you don't filter it, it can easily get mixed up with the data of your real visits, and a difference with the spam, this is much harder to identify later.
e. Bonus: Enabling "Exclude all hits from known bots and spiders"
This is a pre-built feature that will take care of known bots from the IAB bots and spiders list, it is not perfect but it may help.
How to enable bot filtering
- Again in the Admin section of your Analytics, select your Master view under the VIEW column. (Also for any other filtered view)
- Click View Settings
- Near the bottom check the box Exclude all hits from known bots and spiders (Bot Filtering)
- Save and repeat the process with all your Views
What's next? Clean junk traffic from past data
As you know filters only work forward. To clean spam and bots from your historical you will need to create an advanced segment using this guide:
- Comprehensive spam/bot blacklist.
- Answers to common concerns about spam and bots in Google Analytics:
- Does the spam harm my SEO-Rankings?
- How does it get in your reports?
- and many more.
Wrapping it up
Your Google Analytics is as good as the data it contains. If you don't filter it properly you can end up with inflated reports that don't represent the real performance of your site.
"Even on high volume websites were data spamming would be marginal, you still have to explain why there's such a discrepancy. As an analyst you can't dismiss it simply by saying "nah... we're not too sure what it is..."
The filters and pre-built expressions in this guide will help you keep your Analytics data in good shape, so you can feel confident when you make decisions based on it.
I will be updating this guide as new threats appear so you can keep it as a reference.
Do you have any questions or feedback?
I've tried to cover every important detail in this guide, however, if there is any part of the guide where you got stuck, let me know in the comments section below.
Need help setting up reliable and useful Google Analytics for your website/business?