To make it easier to follow I split this guide into 4 parts. The first 2 are practical, the last 2 are informative:

Author

Analytics & GTM Developer

Optimizer Troubleshooter

Follow me on
Category | Google Analytics
Difficulty |

Filtering spam and bot traffic in Google Analytics: A comprehensive solution

Google Analytics is probably one of the most important elements of the decision-making process of your website. The success or failure of your efforts (SEO, Ad campaigns, Social media, etc.) can be easily determined by the accuracy of your data.

If you don't take the appropriate steps, unwanted data such as bots, internal traffic, and spam will decrease that accuracy and could lead to poor decision-making.

The good news is that GA has a powerful filter functionality, that if used well, will help you prevent junk traffic. The bad news is (in my experience) that most sites don't use it correctly.

99% of the sites I audit either don't have any filters or the ones they have are not correctly configured which can create an even bigger problem.

So to help you get data that you can trust, I will show you:

  1. The most effective ways of filtering bots, spam, and other junk traffic in your Google Analytics,
  2. And very important; how to do it safely so you don't risk your real user data.

A quick FAQ about this guide

To save you some time looking through the comments, here are the answers to some of the most common questions I get:

  • Which type of junk traffic does this guide cover?
    • Most common types: bots, crawler spam, ghost spam, internal traffic.
  • Does this work in WordPress, Joomla, Shopify, Wix, Weebly, Squarespace...?
    • Yes. The solutions below are purely based on GA filters, so they will work independently of the CMS you use.
  • How often do you check for new threats and update the expressions?
    • I'm constantly monitoring for bots and new spam (3-5 times a week), and I update the expressions when new significant threats are detected. You can keep the guide as a reference or even better you can get notified when new expressions are out. (See the historical spam blacklist)
    Want to get notified about new threats, and new ways of keeping your Google Analytics data accurate?
Dos and don'ts when filtering data in Google Analytics

Dos and don'ts when filtering data in Google Analytics

Filters are a powerful tool if used in the right way. So let's quickly go through a list of things you shouldn't do:

How does Ghost Spam attacks Google Analtyics?

Now with that out of the way, let's continue with the solution.

First things first. Protect your data from misconfigurations

Before creating any filter in Google Analytics, you should have at least 2 views, one where you will apply the filters and a second one that you will leave unfiltered. This will work as a backup and check the progress of your filters. If you want to be extra cautious you can create a test view to test your filters before applying them.

Here you can find how to create and set best practices for views in Google Analytics.

The 6 filters you need to stop bots and spam in Google Analytics

Once your views are correctly configured, it's time to stop all of that dirty traffic that skews your reports and doesn't let you see the real performance of your site.

It is important to know that there is no one all mighty solution or checkbox that will stop all junk traffic at once, so if you want to have accurate Analytics you will have to work for it.

General notes about filters.

The Google Analytics filters you will need are:

Want me to do this for you? I can configure all the filters your Analytics needs to ensure you are receiving the most accurate data possible.

a. Filter - Campaign Source to stop crawler referral spam

a. Filter - Campaign Source to stop crawler referral spam

To block crawler spam you'll need a filter with an expression that matches the campaign source of all crawler spam.

To save you some time, I've created a set of optimized regular expressions (REGEX) with all the relevant crawler spam detected over the last years, you'll find them below in the instructions.

Here is how it works (test it yourself):

How to create a filter to block crawler referrer spam in Google Analytics

To block referrer spam in Google Analytics you will need to create an exclude filter using the campaign source:

  1. Again go to the admin section of your GA.
  2. On the last column "VIEW", select Filters  and then click + Add Filter
    Add filter button Google Anlaytics
  3. Enter as a name for the filter "Exclude Source - Bots #"
  4. Configure the filter as follows:
    • Filter Type select Custom > Exclude
    • Filter Field select Campaign Source (don't use referral field or it won't work)
      How to block referrer spam in Google Analytics?
  5. Filter Pattern > Paste the following crawler referrer spam expression.

    Create 1 filter for each expression

    Crawler Expression 1

    TOTAL CHARACTERS: 50
    (traffic|bot|website)-?(bot|traffic|website|4free)

    Crawler Expression 2

    TOTAL CHARACTERS: 249
    (axcus|dotmass|artstart|dorothea|artpress|matpre|ameblo|freeseo|jimto|seo-tips|hazblog|overblog|squarespace|ronaldblog|c\.g456|zz\.glgoo|harriett|webedu|barbarahome|verabauer|deirdre|ninacecillia|reginanahum|deniseconnie|firstblog|maxinesamson)\.top

    These expressions were re-built in February 2021. If you created your filter before then, replace all the old expressions and remove any extra filter.

     


    Get free notifications with the updated expressions whenever I detect new threats.

  6. After everything is set Save.

You can create an additional filter with the exact same configuration if you find other referrals that are not useful for your Analytics, for example, mobile test sites, project management tools (Basecamp, asana), monitoring services (uptime), or other spam that is not listed.

b. Filter - Valid hostname for ghost spam and development environments

b. Filter - Valid hostname for ghost spam and development environments

Nowadays, ghost spam is less frequent than it used to be a couple of years ago. However, I still recommend having it in place in case a new wave arrives.

Also, this filter will help prevent useless traffic from development/staging sites and scrapers.

Simple exclude filter vs Hostname filter for ghost spam in Google Analtyics

Here you will find detailed instructions on how to build a valid hostname filter.

c. Filter - Browser size (not set)

c. Filter - Browser size (not set)

The previous filter used to be great for ghost spam that was sent through the measurement protocol. However, spammers keep getting creative, now some of them crawl sites to grab their hostname and Analytics UA ID, and bypass the hostname filter. In those cases, this filter can help.

Important: If you are using 3rd party tools that send data to your Google Analytics through the Measurement protocol ie. call tracking tools like Callrail, don't use this filter, skip to the next one.

  • Create a new filter with the following settings:
    • Filter name: Exclude Browser Size - Spam
    • Filter configuration:
      • Filter typeCustom > Exclude
      • Filter field: Browser size
      • Filter pattern: enter the following expression as it is: Note: even though you see (not set) on your Google Analytics this value is not added until the hit gets to your GA, so creating filters for (not set) in any dimension won't have any effect. Instead what we will use is a REGEX that means empty like this ^$
        ^$
        How to Exclude Browser Size (not set) for Google Analtyics spam

You can use this same REGEX to filter any other (not set) dimension if you need i.e. Language, Browser version, etc

d. Filter - Language for sneaky crawlers and bots

d. Filter - Language for sneaky crawlers and bots

From time to time you may see weird languages showing in your analytics. I prepared an expression that will prevent any language that doesn't have a proper format like es-ESen-US, fr-FR, etc.

I also added to the expression the Language "c" which seems to be left by bots too.

  • Create a new filter with the following settings:
    • Filter name: Exclude Language - Bots
    • Filter configuration:
      • Filter typeCustom > Exclude
      • Filter field: Language Settings
      • Filter pattern: enter the following expression as it is:
        \s[^\s]*\s|.{15,}|\.|,|^c$
        Languange bot filter configuration for Google Analytics
e. Filter - Static and dynamic filters for internal traffic

e. Filter - Static and dynamic filters for internal traffic

Not all junk traffic in Analytics comes from outside your company. In fact, a lot can come from within your team: developers, testers, marketers, support, curious employees, etc.

This type of junk traffic is often overlooked and if you don't filter it, it can easily get mixed up with the data of your real visits, and a difference with the spam, this is much harder to identify later.

f. Bonus - Enabling "Exclude all hits from known bots and spiders"

f. Bonus - Enabling "Exclude all hits from known bots and spiders"

This is a pre-built feature that will take care of known bots from the IAB bots and spiders list, it is not perfect but it may help.

How to enable bot filtering

  1. Again in the Admin section of your Analytics, select your Master view under the VIEW column. (Also for any other filtered view)
  2. Click View Settings
    how to block known bots in Google Analytics
  3. Near the bottom check the box Exclude all hits from known bots and spiders (Bot Filtering)
    Exclude all hits from known bots and spiders
  4. Save and repeat the process with all your Views

What's next? Clean junk traffic from past data

As you know filters only work forward. To clean spam and bots from your history, you will need to create an advanced segment using this guide:

Additional resources

Wrapping it up

Your Google Analytics is as good as the data it contains. If you don't filter it properly you can end up with inflated reports that don't represent the real performance of your site.

"Even on high volume websites were data spamming would be marginal, you still have to explain why there's such a discrepancy. As an analyst you can't dismiss it simply by saying "nah... we're not too sure what it is..."

-Stéphane Hamel

The filters and pre-built expressions in this guide will help you keep your Analytics data in good shape, so you can feel confident when you make decisions based on it.

I will be updating this guide as new threats appear so you can keep it as a reference.

Do you have any questions or feedback?

I've tried to cover all the important details in this guide, however, if there is any section where you are experiencing difficulties, please let me know in the comments section below.

If this article helped you consider leaving a comment below with your experience, it may help other people! :)

Need help setting up reliable and useful Google Analytics for your website/business?

  • Filters for data quality
  • User interaction tracking (events, goals)
  • E-commerce tracking
  • Conversions, Goal & Funnel Configuration
  • Sub-domains & Cross-domain tracking
  • Dynamic IP filtering
  • Google Tag Manager implementation
  • AMP tracking/integration
  • Integrations (Google Ads, Search Console, Facebook Ads, etc)
  • Personalized reports (Data Studio dashboards)
  • Monthly reporting
  • And more...