To make it easier to follow I split this guide into 4 parts. The first 2 are practical, the last 2 are informative:

Author

Analytics & GTM Expert

UX-SEO Advocate

Follow me on
Category | Google Analytics
Difficulty |

Filtering irrelevant traffic in Google Analytics: A comprehensive solution

Google Analytics is probably one of the most important elements of the decision-making process of your website. The success or failure of your efforts (SEO, ad campaigns, social media, content marketing, etc.) can be easily determined by the accuracy and value of your GA reports.

If you don't take the appropriate measures, unwanted data such as bots, internal traffic, and spam will decrease the accuracy of your reports and lead to poor decisions. The good news is that GA has a powerful filter functionality, that if used well, will help prevent all that junk traffic. The bad news is that, in my experience, most sites don't use it properly.

99% of the sites I audit either don't use filters at all or they do, but not correctly, which can create an even bigger problem.

To help you get data that you can trust I will show you:

  1. The most efficient ways of filtering bots, spam, and other junk traffic in your Google Analytics,
  2. And also important, how to do it safely so you don't risk your real user data.

A quick FAQ about this guide

To save you some time looking through the comments, here are the answers to some of the most common questions I get:

  • Which type of spam and bots does this guide cover?
    • The most common ones. This guide will help you prevent common threats,
  • Does this work in WordPress, Joomla, Shopify, Wix, Weebly, Squarespace...?
    • Yes. The solutions below are purely based in GA filters, so it will work independently of the CMS you use.
  • How often do you check for new threats and update the expressions?
    • I'm constantly monitoring for bots and new spam (3-5 times a week) and  I update the expressions when new significant threats are detected. You can keep the guide as a reference or even better you can get notified when new expressions are out. (See the historical spam blacklist)
    Want to get notified about new threats, and new ways of keeping your Google Analytics data accurate?

Dos and don'ts when filtering data in Google Analytics

Filters are very powerful if used in the right way. So let's go quickly through a list of things you should consider when filtering in GA.

How does Ghost Spam attacks Google Analtyics?

First things first. Protect your data from misconfigurations

Before creating any filter in GA you have to make sure you have at least 2 views, one were you will apply the filters and a second one that you will leave unfiltered, this will work as a backup and to check the progress of your filters.

If you need help with this, here you can find the best practices for views in Google Analytics.

5 types of filters to stop bots and spam in Google Analytics

Once your views are correctly configured, its time to stop all of that dirty traffic that skews your reports and don't let you see the real performance of your site.

There is no one all mighty solution or checkbox that will stop all junk traffic at once, so if you want to have accurate Analytics you will have to work for it.

The Google Analytics filters you will need, in order of importance, are:

  1. ISP organization / Service Provider filters for common bot traffic,
  2. ISP domain / Network filter to stop common bot traffic,
  3. Valid hostname filter for ghost spam and DEV environments
  4. Campaign source filters for crawler referral spam,
  5. Language filter for bots,
  6. Extra: Enable the built-in feature "Bot Filtering" (to exclude a few known bots)

Don't have the time to go through this? I can review your analytics and apply all the necessary fixes to ensure you are receiving the most accurate data possible.

General notes about filters.

  • Do not apply filters on the Raw data view,
  • If you are not comfortable working with filters yet, you can use them in a test view first,
  • Filters only work forward, for historical data you will need to use a segment.

a. ISP organization / Service provider filter to stop bot traffic

One of the largest sources of useless traffic in Google Analytics is bots. Thre are all kinds of bots out there crawling the internet. There are the bad ones like scrappers, but there are also good ones, like the ones used for indexing, testing, and security, for example, Google bots coming from "Google LLC"

google llc and the referral 127.0.0.1.8888 in Google Analyitcs
Or the ones coming from the ISP "Facebook Ireland Ltd", this ones are related to Facebook and Instagram Ads, provably checking that the pages in the Ads are not spammy.

In either case, the data coming from them is totally irrelevant and should be filtered. For these filters, we will use the ISP organization.

The ISPs below amount from 10% to up to 30% of bot traffic in most of the Analytics properties I audited.

list of the most common bots by ISP
alibaba.com llc amazon data services brazil amazon data services canada
amazon data services france amazon data services india amazon data services ireland limited
amazon data services ireland ltd amazon data services japan amazon data services nova
amazon data services singapore amazon data services uk amazon technologies inc.
* amazon.com inc. chinanet fujian network chinanet fujian province network
* digitalocean llc early registration addresses evercompliant ltd.
facebook ireland ltd google corporate network google inc.
google llc google switzerland gmbh hubspot
inktomi corporation kazooisyee * linode
* linode llc linode llc sg microsoft corp
microsoft corporation online sas online sas nl
ovh hosting inc. putian city fujian provincial network of cncgroup putian city fujian provincial network of unicom
vultr holdings llc vultr holdings llc frankfurt  
* Make sure to test the ISPs in pink before filtering them

Note: These are just a few examples of ISP with high bot activity, the expression below contains more and it is constantly updated.

  • Create a new filter with the following settings (1 for each expression):
    • Filter Name: Exclude ISP - Bots #
    • Filter configuration:
      • Filter type: Custom > Exclude
      • Filter fieldISP organization
      • Filter pattern: enter the following expressions as they are:

        IMPORTANT

        I extensively test the expressions below across many GA properties to avoid interference with real user data. However, in very few cases the expressions could match some real user data.

        For example, the expression below blocks the ISP "Google llc" and "Microsoft corp", which are ISPs used in these companies' offices.

        In most cases the visits with those ISPs come from bots, however, they can also be used by employees, so if your site sells a product to those companies, just remove them from the expression. If you are not sure you can test the expressions before applying them to your GA.

        ISP Bot Expression 1

        TOTAL CHARACTERS: 255
        hubspot|^google\sllc$|^google\sinc\.$|alibaba\.com\sllc|ovh\shosting\sinc\.|microsoft\scorp|facebook\sireland\sltd|online\ssas|evercompliant|early\sregistration\saddresses|inktomi\scorporation|google\scorporate|google\sswitzerland\sgmbh|kazooisyee|cloud69

        ISP Bot Expression 2

        TOTAL CHARACTERS: 57
        amazon\sdata\sservices|vultr\sholdings|hos\-329450

        ISP Bot Expression *

        TOTAL CHARACTERS: 121
        The following filter may help you prevent large amounts of bot traffic, however, it should be tested in your Analytics before applying it
        chinanet\sfujian|putian\scity\sfujian|linode\sllc|amazon\.com\sinc\.|amazon\stechnologies\sinc\.|digitalocean\sllc|linode$
        Filter google llc and 127.0.0.1.8888 in Google Analytics

b. ISP domain/network domain filter to stop bot traffic

This filter is similar to the previous one but this time we will target the ISP domain/network domain.

This one includes a recent Googlebot that started around June of 2019, with the following names:

  • googlebot.com
  • googleusercontent.com
  • google.com
  • Create a new filter with the following settings:
    • Filter Name: Exclude ISP Domain - Bots 1
    • Filter configuration:
      • Filter type: Custom > Exclude
      • Filter fieldISP domain
      • Filter pattern: enter the following expression as it is

        paloaltonetworks|scaleway|kcura|^google(\.com$|usercontent\.com|bot\.com)$

This filter covers the false amazon keywords from Bing organic traffic with network domain paloaltonetworks.com

c. Valid hostname filter to stop ghost spam and development environments

The valid hostname filter is the single most effective solution against spam. This filter will permanently stop all ghost spam with fake hostnames no matter how it comes or what name it uses.

Google Analytics spam filter

Do not confuse the hostname with the source!

What is a hostname vs a source? expand

To avoid confusions while preparing the filters, I'll briefly explain the difference:

  • The Source is where your visit comes from and there can be any number of them, for example, Facebook, Google, Twitter, Youtube, links from other sites to your site, etc.
  • The hostname, on the other hand, is the site where the visitor arrives. Your main hostname will be your domain and, and depending on the configuration of your site, there may be others.
Source vs Hostname Google Analytics

Preparing your hostname filter

I will quickly describe the steps to build your hostname expression. If you need more help, you can find here detailed instructions on how to build a valid hostname expression.

To build this filter you will need:

  1. Make a list of your hostnames
    • A small-medium site usually has between 1 and 5 valid hostnames.
    • The most important hostname will be your main domain,
    • To see a list of all the active hostnames you need to go to the Network report in your Analytics:
      • Audience > Technology > Network
    • Change the primary dimension to Hostnames  (blue text at the top of the report)
    • Make a list of all the valid ones you find.
  2. Build your hostname expression: Once you have the list of all your hostnames, you should put all of them together separating them with a pipe "|" character like this:
    If you need extra help finding your valid hostnames and building your expression check this guide or let me know and I can help you.
  3. Create the filter: Once you are sure the expression is correct, create the filter as follows.

To create a hostname filter to block all ghost spam in Google Analytics:

  1. Go to the Admin tab, and select the view where you want to apply the filter.
  2. Select Filters under the View column, and select + Add Filter
    Add filter button Google Anlaytics
  3. Enter as a name for the filter Include Valid Hostnames (your domain).
  4. Configure the filter as follows:
    • Filter Type Custom > Include
    • Filter Field Hostname
  5. In the Filter Pattern box copy the hostname expression that you built before.
    How to filter spam in google analtyics?
  6. You can click on Verify this filter, and you will get a quick glance at how the filter will work. But take into account this feature is limited to a small amount of data. So you might get the following message:

    "This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small"

    If you want to test it using all your data here is how to verify a filter with an in-table filter. ;)

  7. Once you are sure your filter is ok, Save it.

IMPORTANT: This filter doesn't require much maintenance, but it's essential to update the expression whenever you add the tracking ID (UA-00000-1) to a new service or domain.

d. Source filter to stop Crawler referral spam

Crawler spam uses a valid hostname so it is a bit harder to detect. To block it you'll need a filter with an expression that matches the source of all known crawler spam.

To save you some time, I've created a set of optimized regular expressions (REGEX) with all the relevant crawler spam detected over the last years, you'll find them below in the instructions.

How to create a filter to block crawler referrer spam in Google Analytics

To block referrer spam in Google Analytics you will need to create an exclude filter using the campaign source:

  1. Again go to the admin section of your GA.
  2. On the last column "VIEW", select Filters  and then click + Add Filter
    Add filter button Google Anlaytics
  3. Enter as a name for the filter "Exclude Source - Bots #"
  4. Configure the filter as follows:
    • Filter Type select Custom > Exclude
    • Filter Field select Campaign Source (don't use referral field or it won't work)
      How to block referrer spam in Google Analytics?
  5. Filter Pattern > Paste the following crawler referrer spam expression.
    These expressions were re-built to optimize the number of filters. If you created your filter before September 28, 2019, replace all the old expressions and remove any extra filter.

    Create 1 filter for each expression

    Crawler Expression 1

    TOTAL CHARACTERS: 245
    semalt|ranksonic|timer4web|anticrawler|dailyrank|sitevaluation|uptime(robot|bot|check|\-|\.com)|foxweber|:8888|mycheaptraffic|bestbaby\.life|(blogping|blogseo)\.xyz|(10best|auto|express|audit|dollars|success|top1|amazon|commerce|resell|99)\-?seo

    Crawler Expression 2

    TOTAL CHARACTERS: 255
    (artblog|howblog|seobook|merryblog|axcus|dotmass|artstart|dorothea|artpress|matpre|ameblo|freeseo|jimto|seo-tips|hazblog|overblog|squarespace|ronaldblog|c\.g456|zz\.glgoo|harriett|webedu)\.top|penzu\.xyz|xtraffic\.plus|easy-website-traffic\.com|tqwh\.net

    Get free notifications with the updated expressions whenever I detect new threats.

  6. After everything is set Save.

Note: These are common sources. If you find other referrals that are not useful for your Analytics, for example, mobile test sites, project management tools, monitoring services. You can create a similar filter with the same configuration.

e. Language filter for sneaky crawlers and bots

From time to time you may see weird languages showing in your analytics. I prepared an expression that will prevent any language that doesn't have a proper format like es-ESen-US, fr-FR, etc.

I also added to the expression the "Language c" which seems to be left by bots.

  • Create a new filter with the following settings:
    • Filter name: Exclude Language - Bots
    • Filter configuration:
      • Filter typeCustom > Exclude
      • Filter field: Language Settings
      • Filter pattern: enter the following expression as it is:
        \s[^\s]*\s|.{15,}|\.|,|^c$
        spam and language c google analytics

f. Extra: Enabling "Exclude all hits from known bots and spiders"

This is a pre-built filter that will take care of known bots from the IAB bots and spiders list, it is not perfect but it helps.

In this case, is a bit easier than custom filters because you just need to check a box.

How to enable bot filtering

  1. Again in the Admin section of your Analytics, select your Master view under the VIEW column. (Also for any other filtered view)
  2. Click View Settings
    how to block known bots in Google Analytics
  3. Near the bottom check the box Exclude all hits from known bots and spiders (Bot Filtering)
    Exclude all hits from known bots and spiders
  4. Save and repeat the process with all your Views

What's next? Get even more value from Analytics data

  1. Clean spam from past data: The filters above will prevent future hits, here you can find instructions to clean spam from your historical data.
  2. Exclude internal traffic: This type of junk traffic is often overlooked. If you don't apply filters for the traffic generated by you or other people of your team, this data will get mixed up with your real visits data, and a difference with the spam, this is much harder to identify later.

Additional resources

Wrapping it up

Whether you are a blogger, a small local website, or a multinational company, filtering your data is crucial for the accuracy of your reports.

"Even on high volume websites were data spamming would be marginal, you still have to explain why there's such a discrepancy. As an analyst you can't dismiss it simply by saying "nah... we're not too sure what it is..."

-Stéphane Hamel

However, you have to do it right. Handling each spammer individually is time-consuming and inefficient. The Google Analytics spam filters explained in this guide may take a bit longer to configure but they will save you a lot of time in the long run.

I will be updating this guide as new threats appear so you can keep it as a reference.

Do you have any questions or feedback?

I've tried to cover every important detail in this guide, however, if there is any part of the guide where you got stuck, let me know in the comments section below.

If this article helped you, consider leaving a comment below with your experience, it may help others! :)need help implementing, configuring, and/or protecting your Google Analytics? I can help

Need help setting up a robust and reliable Google Analytics reporting for your website/business?

  • Filters for data quality
  • User interaction tracking (events, goals)
  • E-commerce tracking
  • GDPR compliance
  • Google Tag Manager implementation
  • Integrations (Google Ads, Search Console, etc)
  • Custom reports (Dashboards, Data studio)
  • Monthly reporting and more...
Be the first to comment :)