To make it easier to follow I split this guide into 4 parts. The first 2 are practical, the last 2 are informative:

Author

Analytics & GTM Expert

UX-SEO Advocate

Follow me on
Category | Google Analytics
Difficulty |

Filtering junk traffic in Google Analytics: A comprehensive solution

Google Analytics is probably one of the most important elements of the decision-making process of your website. The success or failure of your efforts (SEO, ad campaigns, social media, content marketing, etc.) can be easily determined by the accuracy and value of your GA reports.

If you don't take the appropriate measures, unwanted data such as bots, internal traffic, and spam will decrease that accuracy and in some cases lead to poor decisions.

The good news is that GA has a powerful filter functionality, that if used well, will help you prevent all that junk traffic. The bad news is that, in my experience, most sites don't use it properly.

99% of the sites I audit either don't use filters at all or they do, but not correctly, which can create an even bigger problem.

So to help you get data that you can trust, I will show you:

  1. The most effective ways of filtering bots, spam, and other junk traffic in your Google Analytics,
  2. And also important; how to do it safely so you don't risk your real user data.
Latest identified (February 2020)

automatedtraffic4free.club / automatedtraffic4free.com / automatedtraffic4free.host / automatedtraffic4free.pw / bottraffic.host / bottraffic4free.host / bottraffic4free.pw / bottraffic4free.xyz / easy-website-traffic.com / getbottraffic4free.com / tracsistraffic.com / trafficbot.club / trafficbot4free.com / trafficbot4free.host / trafficbot4free.pw / trafficbot4free.xyz / websitebottraffic.com / websitebottraffic.pw / websitebottraffic.xyz

A quick FAQ about this guide

To save you some time looking through the comments, here are the answers to some of the most common questions I get:

  • Which type of spam and bots does this guide cover?
    • This guide will help you prevent common threats,
  • Does this work in WordPress, Joomla, Shopify, Wix, Weebly, Squarespace...?
    • Yes. The solutions below are purely based on GA filters, so it will work independently of the CMS you use.
  • How often do you check for new threats and update the expressions?
    • I'm constantly monitoring for bots and new spam (3-5 times a week), and I update the expressions when new significant threats are detected. You can keep the guide as a reference or even better you can get notified when new expressions are out. (See the historical spam blacklist)
    Want to get notified about new threats, and new ways of keeping your Google Analytics data accurate?

Dos and don'ts when filtering data in Google Analytics

Filters are a powerful tool if used in the right way. So let's go quickly through a list of things you should consider when filtering in GA.

How does Ghost Spam attacks Google Analtyics?

First things first. Protect your data from misconfigurations

Before creating any filter in Google Analytics should have at least 2 views, one were you will apply the filters and a second one that you will leave unfiltered, this will work as a backup and to check the progress of your filters. If you want to be extra cautious you can create a test view to test your filters before applying them.

Here you can find how to create and set best practices for views in Google Analytics.

5 types of filters to stop bots and spam in Google Analytics

Once your views are correctly configured, it's time to stop all of that dirty traffic that skews your reports and doesn't let you see the real performance of your site.

There is no one all mighty solution or checkbox that will stop all junk traffic at once, so if you want to have accurate Analytics you will have to work for it.

The Google Analytics filters you will need are:

  1. ISP organization / Service Provider filters for common bot traffic,
  2. ISP domain / Network filter to stop more common bot traffic,
  3. Campaign source filters for crawler referral spam,
  4. Valid hostname filter for ghost spam and DEV environments,
  5. Language filter for bots,
  6. Static and dynamic filters for internal traffic,
  7. Bonus: Enable the built-in feature "Bot Filtering" (to exclude a few known bots)

Do you want me to do this for you? I can review your analytics and apply all the necessary filters and fixes to ensure you are receiving the most accurate data possible and your analytics settings are optimal.

General notes about filters.

a. ISP organization / Service provider filter to stop bot traffic

One of the largest sources of useless traffic in Google Analytics are bots, and they come in many flavors. There are bad ones like scrappers, but there are also good ones, like the ones used for indexing, testing, and securing your site.

For example, Google bots use for indexing your site "Google LLC". Or the ones coming from the ISP "Facebook Ireland Ltd", which are related to Facebook and Instagram Ads, probably checking if the site is safe.

google llc and the referral 127.0.0.1.8888 in Google Analyitcs

In either case, the data left from them is totally irrelevant for your Analytics and should be filtered.

In most of the Analytics I audit there is bot traffic coming from the following ISPs and they represent between 10% and 30% of total traffic in the property.

list of the most common bots by ISP Organization
alibaba.com llc amazon data services brazil amazon data services canada
amazon data services france amazon data services india amazon data services ireland limited
amazon data services ireland ltd amazon data services japan amazon data services nova
amazon data services singapore amazon data services uk amazon technologies inc.
amazon.com inc. chinanet fujian network chinanet fujian province network
digitalocean llc early registration addresses evercompliant ltd.
facebook ireland ltd google corporate network google inc.
google llc google switzerland gmbh hubspot
inktomi corporation kazooisyee linode
linode llc linode llc sg microsoft corp
microsoft corporation online sas online sas nl
ovh hosting inc. putian city fujian provincial network of cncgroup putian city fujian provincial network of unicom
vultr holdings llc vultr holdings llc frankfurt  
Note: These are just a few examples of ISP with high bot activity, the expressions below may contain more and they are constantly updated.

 

  • Filter google llc and 127.0.0.1.8888 in Google Analytics

    Create a new filter with the following settings (1 for each expression):

    • Filter Name: Exclude ISP - Bots #
    • Filter configuration:
      • Filter type: Custom > Exclude
      • Filter fieldISP organization
      • Filter pattern: enter the following expressions as they are:

    ISP Bot Expression 1

    TOTAL CHARACTERS: 255
    hubspot|^google\sllc$|^google\sinc\.$|alibaba\.com\sllc|ovh\shosting\sinc\.|microsoft\scorp|facebook\sireland\sltd|online\ssas|evercompliant|early\sregistration\saddresses|inktomi\scorporation|google\scorporate|google\sswitzerland\sgmbh|kazooisyee|cloud69

    ISP Bot Expression 2

    TOTAL CHARACTERS: 27
    vultr\sholdings|hos\-329450

    ISP Bot Expression *

    TEST THIS FILTER BEFORE APPLYING IT

    The following filter could help you prevent large amounts of bot traffic, however, it should be tested in your Analytics before applying it.

    I extensively test the expressions below across many GA properties to avoid interference with real user data. However, in very few cases the expressions could match some real user data.

    For example, call tracking tools often use Cloud Services (bots) to send data to Google Analtyics. A common case is Callrail which uses Amazon Cloud Services, in those cases you should remove Amazon ISPs.

    You can use this method to test the filter and see how it will work in your GA.

    TOTAL CHARACTERS: 145
    chinanet\sfujian|putian\scity\sfujian|linode\sllc|amazon\.com\sinc\.|amazon\stechnologies\sinc\.|digitalocean\sllc|linode$|amazon\sdata\sservices

b. ISP domain/network domain filter to stop bot traffic

This filter is similar to the previous one but this time we will target the ISP domain/network domain.

This one includes a recent Googlebot that started around June of 2019, with the following names:

list of the most common bots by ISP domain
googlebot.com googleusercontent.com google.com (ISP)
paloaltonetworks.com (for the "Amazon" keyword coming from Bing organic traffic)
  • Create a new filter with the following settings:
    • Filter Name: Exclude ISP Domain - Bots 1
    • Filter configuration:
      • Filter type: Custom > Exclude
      • Filter fieldISP domain
      • Filter pattern: enter the following expression as it is

        paloaltonetworks|scaleway|kcura|^google(\.com$|usercontent\.com|bot\.com)$

c. Campaign Source filter to stop Crawler referral spam

To block crawler spam you'll need a filter with an expression that matches the campaign source of all crawler spam.

To save you some time, I've created a set of optimized regular expressions (REGEX) with all the relevant crawler spam detected over the last years, you'll find them below in the instructions.

How to create a filter to block crawler referrer spam in Google Analytics

To block referrer spam in Google Analytics you will need to create an exclude filter using the campaign source:

  1. Again go to the admin section of your GA.
  2. On the last column "VIEW", select Filters  and then click + Add Filter
    Add filter button Google Anlaytics
  3. Enter as a name for the filter "Exclude Source - Bots #"
  4. Configure the filter as follows:
    • Filter Type select Custom > Exclude
    • Filter Field select Campaign Source (don't use referral field or it won't work)
      How to block referrer spam in Google Analytics?
  5. Filter Pattern > Paste the following crawler referrer spam expression.
    These expressions were re-built to optimize the number of filters. If you created your filter before September 28, 2019, replace all the old expressions and remove any extra filter.

    Create 1 filter for each expression

    Crawler Expression 1

    TOTAL CHARACTERS: 217
    semalt|ranksonic|timer4web|anticrawler|uptime(robot|bot|check|\-|\.com)|foxweber|:8888|xtraffic\.plus|(christopherblog|tammyblog|billyblog)\.online|traffic4free|bottraffic|easy-website\-traffic|bot4free|trafficbot

    Crawler Expression 2

    TOTAL CHARACTERS: 249
    (axcus|dotmass|artstart|dorothea|artpress|matpre|ameblo|freeseo|jimto|seo-tips|hazblog|overblog|squarespace|ronaldblog|c\.g456|zz\.glgoo|harriett|webedu|barbarahome|verabauer|deirdre|ninacecillia|reginanahum|deniseconnie|firstblog|maxinesamson)\.top

    Get free notifications with the updated expressions whenever I detect new threats.

  6. After everything is set Save.

Note: These are common sources. You can create an additional filter with the exact same configuration, if you find other referrals that are not useful for your Analytics, for example, mobile test sites, project management tools, monitoring services, or other spam is not listed.

d. Valid hostname filter to stop ghost spam and development environments

Nowadays ghost spam is less frequent than it used to be a couple of years ago, however, I still recommend to have it in place in case a new wave arrives. Also this filter will help you prevent useless traffic from dev sites and scrapers.

Simple exclude filter vs Hostname filter for ghost spam in Google Analtyics

Here you will find detailed instructions on how to build a valid hostname filter.

e. Language filter for sneaky crawlers and bots

From time to time you may see weird languages showing in your analytics. I prepared an expression that will prevent any language that doesn't have a proper format like es-ESen-US, fr-FR, etc.

I also added to the expression the "Language c" which seems to be left by bots.

  • Create a new filter with the following settings:
    • Filter name: Exclude Language - Bots
    • Filter configuration:
      • Filter typeCustom > Exclude
      • Filter field: Language Settings
      • Filter pattern: enter the following expression as it is:
        \s[^\s]*\s|.{15,}|\.|,|^c$
        Languange bot filter configuration for Google Analytics

f. Static and dynamic filters for internal traffic

Not all junk traffic in Analytics comes from outside your company. In fact, a lot can come from within your team: developers, testers, marketers, support, curious employees, etc.

This type of junk traffic is often overlooked and if you don't filter it, it can easily get mixed up with the data of your real visits, and a difference with the spam, this is much harder to identify later.

e. Bonus: Enabling "Exclude all hits from known bots and spiders"

This is a pre-built feature that will take care of known bots from the IAB bots and spiders list, it is not perfect but it may help.

How to enable bot filtering

  1. Again in the Admin section of your Analytics, select your Master view under the VIEW column. (Also for any other filtered view)
  2. Click View Settings
    how to block known bots in Google Analytics
  3. Near the bottom check the box Exclude all hits from known bots and spiders (Bot Filtering)
    Exclude all hits from known bots and spiders
  4. Save and repeat the process with all your Views

What's next? Clean junk traffic from past data

As you know filters only work forward. To clean spam and bots from your historical you will need to create an advanced segment using this guide:

Additional resources

Wrapping it up

Your Google Analytics is as good as the data it contains. If you don't filter it properly you can end up with inflated reports that don't represent the real performance of your site.

"Even on high volume websites were data spamming would be marginal, you still have to explain why there's such a discrepancy. As an analyst you can't dismiss it simply by saying "nah... we're not too sure what it is..."

-Stéphane Hamel

The filters and pre-built expressions in this guide will help you keep your Analytics data in good shape, so you can feel confident when you make decisions based on it.

I will be updating this guide as new threats appear so you can keep it as a reference.

Do you have any questions or feedback?

I've tried to cover every important detail in this guide, however, if there is any part of the guide where you got stuck, let me know in the comments section below.

Need help setting up reliable and useful Google Analytics for your website/business?

  • Filters for data quality
  • User interaction tracking (events, goals)
  • E-commerce tracking
  • Conversions, Goal & Funnel Configuration
  • Sub-domains & Cross-domain tracking
  • Dynamic IP filtering
  • Google Tag Manager implementation
  • AMP tracking/integration
  • Integrations (Google Ads, Search Console, Facebook Ads, etc)
  • Personalized reports (Data Studio dashboards)
  • Monthly reporting
  • And more...
Be the first to comment :)