To make it easier to follow I split this guide into 4 parts. The first 2 are practical, the last 2 are informative:

Author

Analytics & GTM Developer

Optimizer Troubleshooter

Follow me on
Category | Google Analytics
Difficulty |

Update: Unfortunately, as of February 2020 Google Analytics has deprecated "Service Provider" and "Network Domain".

So filter using these dimensions as a condition will no longer work, which is a pity because they were extremely useful to identify and filter Bot traffic. The rest of the filters listed below are still working so you can keep using them. 

I'm working in a new and more comprehensive way of identifying and filtering bot traffic. Stay tuned!

Filtering junk traffic in Google Analytics: A comprehensive solution

Google Analytics is probably one of the most important elements of the decision-making process of your website. The success or failure of your efforts (SEO, ad campaigns, social media, content marketing, etc.) can be easily determined by the accuracy and value of your GA reports.

If you don't take the appropriate measures, unwanted data such as bots, internal traffic, and spam will decrease that accuracy and in some cases lead to poor decisions.

The good news is that GA has a powerful filter functionality, that if used well, will help you prevent all that junk traffic. The bad news is that, in my experience, most sites don't use it properly.

99% of the sites I audit either don't use filters at all or they do, but not correctly, which can create an even bigger problem.

So to help you get data that you can trust, I will show you:

  1. The most effective ways of filtering bots, spam, and other junk traffic in your Google Analytics,
  2. And also important; how to do it safely so you don't risk your real user data.
Latest identified (February 2020)

automatedtraffic4free.club / automatedtraffic4free.com / automatedtraffic4free.host / automatedtraffic4free.pw / bottraffic.host / bottraffic4free.host / bottraffic4free.pw / bottraffic4free.xyz / easy-website-traffic.com / getbottraffic4free.com / tracsistraffic.com / trafficbot.club / trafficbot4free.com / trafficbot4free.host / trafficbot4free.pw / trafficbot4free.xyz / websitebottraffic.com / websitebottraffic.pw / websitebottraffic.xyz

A quick FAQ about this guide

To save you some time looking through the comments, here are the answers to some of the most common questions I get:

  • Which type of spam and bots does this guide cover?
    • This guide will help you prevent common threats,
  • Does this work in WordPress, Joomla, Shopify, Wix, Weebly, Squarespace...?
    • Yes. The solutions below are purely based on GA filters, so it will work independently of the CMS you use.
  • How often do you check for new threats and update the expressions?
    • I'm constantly monitoring for bots and new spam (3-5 times a week), and I update the expressions when new significant threats are detected. You can keep the guide as a reference or even better you can get notified when new expressions are out. (See the historical spam blacklist)
    Want to get notified about new threats, and new ways of keeping your Google Analytics data accurate?

Dos and don'ts when filtering data in Google Analytics

Filters are a powerful tool if used in the right way. So let's go quickly through a list of things you should consider when filtering in GA.

How does Ghost Spam attacks Google Analtyics?

First things first. Protect your data from misconfigurations

Before creating any filter in Google Analytics should have at least 2 views, one were you will apply the filters and a second one that you will leave unfiltered, this will work as a backup and to check the progress of your filters. If you want to be extra cautious you can create a test view to test your filters before applying them.

Here you can find how to create and set best practices for views in Google Analytics.

5 types of filters to stop bots and spam in Google Analytics

Once your views are correctly configured, it's time to stop all of that dirty traffic that skews your reports and doesn't let you see the real performance of your site.

There is no one all mighty solution or checkbox that will stop all junk traffic at once, so if you want to have accurate Analytics you will have to work for it.

The Google Analytics filters you will need are:

  1. Campaign source filters for crawler referral spam,
  2. Valid hostname filter for ghost spam and DEV environments,
  3. Language filter for bots,
  4. Static and dynamic filters for internal traffic,
  5. Bonus: Enable the built-in feature "Bot Filtering" (to exclude a few known bots)

Do you want me to do this for you? I can review your analytics and apply all the necessary filters and fixes to ensure you are receiving the most accurate data possible and your analytics settings are optimal.

General notes about filters.

a. Campaign Source filter to stop Crawler referral spam

To block crawler spam you'll need a filter with an expression that matches the campaign source of all crawler spam.

To save you some time, I've created a set of optimized regular expressions (REGEX) with all the relevant crawler spam detected over the last years, you'll find them below in the instructions.

How to create a filter to block crawler referrer spam in Google Analytics

To block referrer spam in Google Analytics you will need to create an exclude filter using the campaign source:

  1. Again go to the admin section of your GA.
  2. On the last column "VIEW", select Filters  and then click + Add Filter
    Add filter button Google Anlaytics
  3. Enter as a name for the filter "Exclude Source - Bots #"
  4. Configure the filter as follows:
    • Filter Type select Custom > Exclude
    • Filter Field select Campaign Source (don't use referral field or it won't work)
      How to block referrer spam in Google Analytics?
  5. Filter Pattern > Paste the following crawler referrer spam expression.
    These expressions were re-built to optimize the number of filters. If you created your filter before September 28, 2019, replace all the old expressions and remove any extra filter.

    Create 1 filter for each expression

    Crawler Expression 1

    TOTAL CHARACTERS: 217
    semalt|ranksonic|timer4web|anticrawler|uptime(robot|bot|check|\-|\.com)|foxweber|:8888|xtraffic\.plus|(christopherblog|tammyblog|billyblog)\.online|traffic4free|bottraffic|easy-website\-traffic|bot4free|trafficbot

    Crawler Expression 2

    TOTAL CHARACTERS: 249
    (axcus|dotmass|artstart|dorothea|artpress|matpre|ameblo|freeseo|jimto|seo-tips|hazblog|overblog|squarespace|ronaldblog|c\.g456|zz\.glgoo|harriett|webedu|barbarahome|verabauer|deirdre|ninacecillia|reginanahum|deniseconnie|firstblog|maxinesamson)\.top

    Get free notifications with the updated expressions whenever I detect new threats.

  6. After everything is set Save.

Note: These are common sources. You can create an additional filter with the exact same configuration, if you find other referrals that are not useful for your Analytics, for example, mobile test sites, project management tools, monitoring services, or other spam is not listed.

b. Valid hostname filter to stop ghost spam and development environments

Nowadays ghost spam is less frequent than it used to be a couple of years ago, however, I still recommend to have it in place in case a new wave arrives. Also this filter will help you prevent useless traffic from dev sites and scrapers.

Simple exclude filter vs Hostname filter for ghost spam in Google Analtyics

Here you will find detailed instructions on how to build a valid hostname filter.

c. Language filter for sneaky crawlers and bots

From time to time you may see weird languages showing in your analytics. I prepared an expression that will prevent any language that doesn't have a proper format like es-ESen-US, fr-FR, etc.

I also added to the expression the "Language c" which seems to be left by bots.

  • Create a new filter with the following settings:
    • Filter name: Exclude Language - Bots
    • Filter configuration:
      • Filter typeCustom > Exclude
      • Filter field: Language Settings
      • Filter pattern: enter the following expression as it is:
        \s[^\s]*\s|.{15,}|\.|,|^c$
        Languange bot filter configuration for Google Analytics

d. Static and dynamic filters for internal traffic

Not all junk traffic in Analytics comes from outside your company. In fact, a lot can come from within your team: developers, testers, marketers, support, curious employees, etc.

This type of junk traffic is often overlooked and if you don't filter it, it can easily get mixed up with the data of your real visits, and a difference with the spam, this is much harder to identify later.

e. Bonus: Enabling "Exclude all hits from known bots and spiders"

This is a pre-built feature that will take care of known bots from the IAB bots and spiders list, it is not perfect but it may help.

How to enable bot filtering

  1. Again in the Admin section of your Analytics, select your Master view under the VIEW column. (Also for any other filtered view)
  2. Click View Settings
    how to block known bots in Google Analytics
  3. Near the bottom check the box Exclude all hits from known bots and spiders (Bot Filtering)
    Exclude all hits from known bots and spiders
  4. Save and repeat the process with all your Views

What's next? Clean junk traffic from past data

As you know filters only work forward. To clean spam and bots from your historical you will need to create an advanced segment using this guide:

Additional resources

Wrapping it up

Your Google Analytics is as good as the data it contains. If you don't filter it properly you can end up with inflated reports that don't represent the real performance of your site.

"Even on high volume websites were data spamming would be marginal, you still have to explain why there's such a discrepancy. As an analyst you can't dismiss it simply by saying "nah... we're not too sure what it is..."

-Stéphane Hamel

The filters and pre-built expressions in this guide will help you keep your Analytics data in good shape, so you can feel confident when you make decisions based on it.

I will be updating this guide as new threats appear so you can keep it as a reference.

Do you have any questions or feedback?

I've tried to cover every important detail in this guide, however, if there is any part of the guide where you got stuck, let me know in the comments section below.

Need help setting up reliable and useful Google Analytics for your website/business?

  • Filters for data quality
  • User interaction tracking (events, goals)
  • E-commerce tracking
  • Conversions, Goal & Funnel Configuration
  • Sub-domains & Cross-domain tracking
  • Dynamic IP filtering
  • Google Tag Manager implementation
  • AMP tracking/integration
  • Integrations (Google Ads, Search Console, Facebook Ads, etc)
  • Personalized reports (Data Studio dashboards)
  • Monthly reporting
  • And more...
Be the first to comment :)