To make it easier to follow I split this post into 4 parts (Filter/Clean/FAQ/List):

Author

Analytics advocate

SEO expert

User Experience passionate

Follow me on
Category | Google Analytics
Difficulty |

Why spam and bots keep showing in your Google Analytics?

Google Analytics has a powerful built-in filter functionality that helps against spam, bots, and other junk traffic that damages the integrity of the data.  However, I often find sites that don't use this tools or they use them but in the wrong way.

To help you get accurate data and avoid wasting your time on partial solutions, I will show you how to efficiently filter any type of junk traffic in your Analytics, and how to do it safely so you don't risk your real user data.

Quick FAQ about this guide

I often get asked the following questions:

  • Which type of spam and bots does this guide cover?
    • All of them! The filters will help you stop any type of Google Analytics spam (referral, keyword, language, page, etc.) and many known bots,
  • Does this work in WordPress, Joomla, Shopify, Wix, Weebly, Squarespace?
    • The solutions below are purely GA based, so it will work independently of the platform (CMS) you use.
  • How often are the expressions updated for new spam?
    • This guide is updated as soon as new significant threats are detected, so you can keep it as a reference.
    Want to get notified of important updates, new threats, and new ways of keeping your Analytics data accurate?

Spam and bots recently detected (last checked: October 19, 2018)

Examples of spam covered by:
  • The valid hostname filter for ghost spam:
    LATEST WAVE OF GHOST SPAM
    • seo-services-with-results.com/seo2.php
    • free-seo-help.org/seo2.php
    • worldwide-seo-services.com/seo2.php
    • your-seo-promotion.com/seo2.php
    • your-seo-promotion-service.com/seo2.php
    • my-seo-promotion.com/seo2.php
    • my-seo-promotion-service.com/seo2.php
    • seo-services-with-results.net/seo2.php
    • free-seo-help.net/seo2.php
    • worldwide-seo-services.net/seo2.php
    • your-seo-promotion.net/seo2.php
    • your-seo-promotion-service.net/seo2.php
    • my-seo-promotion.net/seo2.php
    • my-seo-promotion-service.net/seo2.php
    • page /h/1234567.html (the number changes)
    • Page title (not set)
  • Source filters for crawler referrer spam:  seoservices2018.com, resell-seo-services.com, blog100.org, 10bestseo.com + 200 more
  • ISP domain filter for bots: scaleway.com, fake amazon keywords from "bing" and network domain paloaltonetworks.
  • ISP organization filter for bots: online sas, microsoft corporation, hubspot and google llc bots
  • See the full historical spam blacklist 500.
Editorial note: Please know that not all sites mentioned in this post were directly involved in spamming your Analytics. In some cases, the domains were referred by spammers to either damage the reputation of the website or because the owner fell in a "cheap traffic service" trap especially the last wave of spam.
Latest spamShow list
LAST WAVE of spam
seo-services-with-results.com seo-services-with-results.net  
better-seo-promotion.com free-seo-help.net free-seo-help.org
free-website-traffic.com better-seo-promotion.net your-seo-promotion.com
autoseo-services.net your-seo-help.com your-seo-promotion-service.com
my-seo-promotion.net worldwide-seo-services.com my-seo-promotion-service.com
makingsalesmakingmoney.com    
     
originalbyt-paintings.info    
performiz-like-alibaba.info perform-likeir-alibaba.info original-paintingsor.info
perform-likeism-alibaba.info perform-like-alibabaity.info perform-likeity-alibaba.info
nubuilderist.info nubuilderle.info nubuilderz.info
nubuilderfy.info nubuilderof.info nubuilderify.info

Want me to fix this for you? I can review your analytics and apply all the necessary measures to ensure you are receiving the most accurate data possible.

Myths about the spam in Analytics

Let's begin with what you shouldn't. (If you made any of the mistakes below, undo the changes)

How does Ghost Spam attacks Google Analtyics?

Protect your data from misconfigurations

Before doing anything else you have to make sure you have at least 2 views, one were you will apply the filters and a second that you will leave unfiltered

Here is a common set of views:

  • Master - View where you will apply filters. It's the one used for analysis
  • Unfiltered - Your backup view, which, shouldn't have any filter or any setting that alters the incoming data.
  • Test (Optional) - if you want to be extra cautious you can create a test view that you can use to try the filters. Especially recommended for people getting started with filters.

If you need help creating this views, here you can find the instructions on how to create an unfiltered and a test view

Google analytics spam filters that really work

Once your views are correctly configured, is time to stop all of that dirty traffic that skews your reports and don't let you see the real performance of your site.

The filters I'm going to show you have been proven to work for over 3 years regardless of the methods used by spammers (referral, keyword, page, language, etc.)

Want some proof? Here are some examples of users that followed this guide and shared their results with me. Show Examples

The screenshots are from 2016 but the results for your analytics will be the same to this date.

These are the Google Analytics filters you will need are:

  1. Hostname filter for ghost spam (referral, page, keyword spam, language, etc.),
  2. Campaign source filter for crawler referral spam,
  3. Language filter for sneaky spam (and some bots),
  4. ISP organization filter for bot traffic,
  5. ISP domain / Network filter to stop bot traffic,
  6. Extra: Enable the built-in feature "Bot Filtering" (to exclude a few known bots)

Need help setting up a robust and reliable reporting for your website/business?

  • Filters for data quality 
  • User interaction tracking (events, goals)
  • E-commerce tracking
  • GDPR compliance
  • Google Tag Manager implementation
  • Integrations (Google Ads, Search Console, etc)
  • Custom reports (Dashboards, Data studio)
  • Monthly reporting and more...

General notes about filters.

  • While most of the time filters start working within minutes, officially it may take up to 24 hours before the filter effects become visible in your data, so be patient.
  • You will apply the filters either in the master view, the view(s) to be used for analysis, or the test view if you want to try them first.
  • Filters only work forward, for historical data you will use the segment (3rd step)

a. Hostname filter to stop Ghost spam 

The valid hostname filter is the single most effective solution against the spam. This filter will permanently stop all ghost traffic no matter how it comes or what name it uses.

The difference between this solution and others that are commonly shared is that this filter is based on something that you control, your hostnames. So you won't have to worry about updating filters when new spam shows up and as long as you add all your hostname you don't have to worry, you won't exclude any real traffic.

What is a hostname vs a source? expand

People often mistake hostnames with sources. To avoid confusions while preparing the filters, I'll briefly explain the difference:

  • The Source is where your visit comes from and there can be any number of them, for example, Facebook, Google, Twitter, Youtube, links from other sites to your site, etc.
  • The hostname, on the other hand, is the site where the visitor arrives. Your main hostname will be your domain and, and depending on the configuration of your site, there may be others.
Source vs Hostname Google Analytics

To understand how this filter works you must know how ghost spam works. The spammers that use this technique abuse the measurement protocol a tool that allows sending data to GA directly for other purposes obviously. Since the spammer doesn't know who are they hitting, they always leave a fake hostname or an "undefined" hostname which will appear as (not set) in your reports.

If we use this logic to create a filter that will only let pass traffic with valid hostnames, all ghost traffic will be automatically excluded. This solution is much more efficient than the one commonly used, which is to create a filter with the name of spam. Plus this technique will work for any type: referral, keyword, page, language, etc.

Google Analytics spam filter
To build this filter you will need 3 things:
  1. Make a list of your hostnames:
    • To see a list of all the active hostnames you need go to the Network report in your Analytics:
      • Audience > Technology > Network
    • Change the primary dimension to Hostnames  (blue text at the top of the report)
    • Make a list of all the valid ones you find. You should see at list one valid, which is your main domain, the rest will depend on the configuration of your site.
  2. Build your hostname expression: Once you have the list of all your hostnames, you should put all of them together separating them with a pipe "|" character like this:
    If you need extra help finding your valid hostnames and building your expression check this guide or let me know and I can personally help you.
  3. Create the filter: Once you are sure the expression is correct, create the filter as follows.

How to create a filter to block ghost spam in Google Analytics

To block all ghost spam in Google Analytics, you need to create an include hostname filter:

  1. Go to the Admin tab, and select the view where you want to apply the filter. If you follow the naming above, this will be the Master view or Test view.
  2. Select Filters under the View column, and select + Add Filter
    Add filter button Google Anlaytics
  3. Enter as a name for the filter Include Valid Hostnames.
  4. Configure the filter as follows:
    • Filter Type Custom > Include
    • Filter Field Hostname
  5. In the Filter Pattern box copy the hostname expression that you built before.
    How to filter spam in google analtyics?
  6. You can click on Verify this filter, and you will get a quick glance at how the filter will work. But take into account this feature is limited to a small amount of data. So you might get the following message:

    "This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small"

    But don't worry, if you followed the instructions in the previous step (Build your expression) you already tested your filter, if for some reason you missed it, here it's again how to verify a filter with an in-table filter. ;)

  7. Once you make sure your filter is ok, Save the filter.

IMPORTANT: This filter doesn't require updates for new ghost spam, but it's essential to update the expression whenever you add the tracking ID (UA-00000-1) to a new service or domain.

b. Source filter to stop Crawler referral spam

Crawler spam uses a valid hostname so it is a bit harder to detect. To block it you'll need a filter with an expression that matches the source of all known crawler spam.

To save you some time, I've created a set of optimized regular expressions (REGEX) with all the relevant crawler spam detected over the last years, you'll find them below in the instructions.

How to create a filter to block crawler referrer spam in Google Analytics

To block referrer spam in Google Analytics you will need to create an exclude filter using the campaign source:

  1. Again go to the admin section of your GA.
  2. On the last column "VIEW", select Filters  and then click + Add Filter
    Add filter button Google Anlaytics
  3. Enter as a name for the filter "Exclude Crawler Spam"
  4. Configure the filter as follows:
    • Filter Type select Custom > Exclude
    • Filter Field select Campaign Source (don't use referral field or it won't work)
      How to block referrer spam in Google Analytics?
  5. Filter Pattern > Paste the following crawler referrer spam expression.

    Create 1 filter for each expression

    Crawler Expression 1

    TOTAL CHARACTERS: 230
    uptime(robot|bot|check|\-|\.com)|vitaly|sharebutton|semalt|ranksonic|share\-button|anticrawler|timer4web|free\-video\-tool|responsive\-test|dogsrun|fix\-website\-er|dailyrank|sitevaluation|99seo|top10\-way|seo(\-2\-0|\-analysis)\.

    Crawler Expression 2

    TOTAL CHARACTERS: 224
    (videos|buttons)\-for\-your|best\-seo\-(solution|offer)|buttons\-for\-website|profit\.xyz|dbutton|keywords\-monitoring|platezhka|7makemoney|forum69|kings\-analytics|checkpagerank|pr\-cy\.ru|\-\-(production|website|sale)\.com

    Crawler Expression 3

    TOTAL CHARACTERS: 248
    (express|audit|dollars|success|top1|amazon|commerce)\-seo|free\-video\-tool|datract|hacĸer|ɢoogl|slifty\.github|\-liar.ru|3\-letter\-|foxweber|free\-fbook|goodwriterssales|your\-rankings|tourcroatia|spinnerco|justkillingti|suralink|worldtraveler\.w

    Crawler Expression 4

    TOTAL CHARACTERS: 238
    oldfaithfultaxi|christopherlane|hollywoodweeklymagazine|losangeles\-ads|anniemation|timdreby|pcimforum|yellowstonesafaritours|autoseo|blogarama|for\-placing|brainwizard|casinos4|ḷ\.com|\-backlinks\.com|phoenicx\.co\.uk|be\-escorts|vidyoze

    Crawler Expression 5

    TOTAL CHARACTERS: 196
    brasseriebread|helvetiiconsulting|johntrapane|cloudsendchef|theautoprofit|:8888|blog1989|incomekey|amazon\-ads\.ovh|krumble\.net|10bestseo|seo\-watch|blog100|seoservices2018|resell\-seo|auto\-?seo

    Get an email with the updated expressions whenever I detect new crawlers.

  6. After everything is set Save.

Note: You may find other referrals that may not be spam, but neither relevant for you. For example, mobile test sites or cache sites. You can create a similar filter with the same configuration and add all the irrelevant referrals there to keep your data pristine and reliable.

Now that you are familiar with the filter window I won't repeat the full instructions on the following filters. To create them you will need to follow the exact same steps of the previous 2 filters and change the folowing fields:

  • Filter name
  • Filter field
  • Filter expression

c. Language filter for sneaky crawlers and bots

From time to time you may see weird languages showing in your analytics. I prepared an expression that will prevent any language that doesn't have a proper format like es-ESen-US, fr-FR, etc.

I also added to the expression the "Language c" which seems to be left by bots.

  • Create a new filter with the following settings:
    • Filter name: Exclude invalid languages
    • Filter configuration:
      • Filter tupeCustom > Exclude
      • Filter field: Language Settings
      • Filter pattern: enter the following expression as it is:
        \s[^\s]*\s|.{15,}|\.|,|^c$
        spam and language c google analytics

d. ISP organization/network filter to stop bot traffic

Not all irrelevant traffic comes from spammers, some companies uses bots to crawl sites for information (indexing, analytics, etc.), those bots may not have bad intentions but they still pollute your data.

Here is an example:

google llc and the referral 127.0.0.1.8888 in Google Analyitcs
Another common example is if you use Facebook or Instagram ads. If you do you will notice some a considerable amount o bot traffic with the following characteristics.
  • Screen resolution: 2000x2000
  • Country: Peru, Philipines, United States
  • City: (not set), Quezon City, Manila
  • Service provider: Facebook Ireland Ltd

The following filter includes the following ISP organizations that are used mostly by bots.

ISP identified for using bots
facebook ireland ltd google llc google inc.
alibaba.com llc ovh hosting inc. microsoft corp
microsoft corporation hubspot evercompliant ltd.
     
  • Create a new filter with the following settings:
    • Filter NameExclude ISP provider bots
    • Filter configuration:
      • Filter type: Custom > Exclude  
      • Filter fieldISP organization
      • Filter pattern: enter the following expression as it is:
        hubspot|^google\sllc$|^google\sinc\.$|alibaba\.com\sllc|ovh\shosting\sinc\.|microsoft\scorp|facebook\sireland\sltd|online\ssas|evercompliant|early\sregistration\saddresses
        Filter google llc and 127.0.0.1.8888 in Google Analytics

e. ISP domain/network domain filter to stop bot traffic

This filter is similar to the previous one but this time we will target the ISP domain or network.

  • Create a new filter with the following settings:
    • Filter Name: Exclude ISP domain bots
    • Filter configuration:
      • Filter type: Custom > Exclude
      • Filter fieldISP domain
      • Filter pattern: enter the following expression as it is

        paloaltonetworks|scaleway|kcura

This filter covers the weird amazon keywords from Bing organic traffic with network domain paloaltonetworks.com

f. Extra: Enabling "Exclude all hits from known bots and spiders"

This is a pre-built filter that will take care of known bots from the IAB bots and spiders list, it is not perfect but it helps.

In this case, is a bit easier than custom filters because you just need to check a box.

How to enable bot filtering

  1. Again in the Admin section of your Analytics, select your Master view under the VIEW column. (Also for any other filtered view)
  2. Click View Settings
    how to block known bots in Google Analytics
  3. Near the bottom check the box Exclude all hits from known bots and spiders (Bot Filtering)
    Exclude all hits from known bots and spiders
  4. Save and repeat the process with all your Views

Next steps to improve even more your Analytics data

  1. Clean/remove spam from your historical data.
  2. Exclude internal traffic: This type of junk traffic is often overlooked. If you don't apply filters for the traffic generated by you or other people of your team, this data will get mixed up with your real visits data, and a difference with the spam, this is much harder to identify later.

Additional resources

LAST WAVE of spam
seo-services-with-results.com seo-services-with-results.net  
better-seo-promotion.com free-seo-help.net free-seo-help.org
free-website-traffic.com better-seo-promotion.net your-seo-promotion.com
autoseo-services.net your-seo-help.com your-seo-promotion-service.com
my-seo-promotion.net worldwide-seo-services.com my-seo-promotion-service.com
makingsalesmakingmoney.com    
     
originalbyt-paintings.info    
performiz-like-alibaba.info perform-likeir-alibaba.info original-paintingsor.info
perform-likeism-alibaba.info perform-like-alibabaity.info perform-likeity-alibaba.info
nubuilderist.info nubuilderle.info nubuilderz.info
nubuilderfy.info nubuilderof.info nubuilderify.info

Wrapping it up

Whether you are a blogger, a small local website, or a multinational company, filtering your data is crucial for the accuracy of your reports.

"Even on high volume websites were data spamming would be marginal, you still have to explain why there's such a discrepancy. As an analyst you can't dismiss it simply by saying "nah... we're not too sure what it is..."

-Stéphane Hamel

However, you have to do it right. Handling each spammer individually is time-consuming and inefficient. The Google Analytics spam filters explained in this guide may take a bit longer to configure but they will save you a lot of time in the long run.

I will be updating this guide as new threats appear so you can keep it as a reference.

Do you have any questions or feedback?

I've tried to cover every important detail in this guide, however, if there is any part of the guide where you got stuck, let me know in the comments section below.

If this article helped you, consider sharing it or leaving a comment below on your experience, it may help other people! :)need help implementing, configuring, and/or protecting your Google Analytics? I can help

Need help setting up a robust and reliable reporting for your website/business?

  • Filters for data quality 
  • User interaction tracking (events, goals)
  • E-commerce tracking
  • GDPR compliance
  • Google Tag Manager implementation
  • Integrations (Google Ads, Search Console, etc)
  • Custom reports (Dashboards, Data studio)
  • Monthly reporting and more...
Author

Analytics advocate

SEO expert

User Experience passionate

Follow me on
Be the first to comment :)