Filtering spam and bot traffic in Google Analytics: A comprehensive solution
Google Analytics is probably one of the most important elements of the decision-making process of your website. The success or failure of your efforts (SEO, Ad campaigns, Social media, etc.) can be easily determined by the accuracy of your data.
If you don't take the appropriate steps, unwanted data such as bots, internal traffic, and spam will decrease that accuracy and could lead to poor decision-making.
The good news is that GA has a powerful filter functionality, that if used well, will help you prevent junk traffic. The bad news is (in my experience) that most sites don't use it correctly.
99% of the sites I audit either don't have any filters or the ones they have are not correctly configured which can create an even bigger problem.
So to help you get data that you can trust, I will show you:
- The most effective ways of filtering bots, spam, and other junk traffic in your Google Analytics,
- And very important; how to do it safely so you don't risk your real user data.
A quick FAQ about this guide
To save you some time looking through the comments, here are the answers to some of the most common questions I get:
- Which type of junk traffic does this guide cover?
- Most common types: bots, crawler spam, ghost spam, internal traffic.
- Does this work in WordPress, Joomla, Shopify, Wix, Weebly, Squarespace...?
- Yes. The solutions below are purely based on GA filters, so they will work independently of the CMS you use.
- How often do you check for new threats and update the expressions?
- I'm constantly monitoring for bots and new spam (3-5 times a week), and I update the expressions when new significant threats are detected. You can keep the guide as a reference or even better you can get notified when new expressions are out. (See the historical spam blacklist)
Want to get notified about new threats, and new ways of keeping your Google Analytics data accurate?
Dos and don'ts when filtering data in Google Analytics
Dos and don'ts when filtering data in Google Analytics
Filters are a powerful tool if used in the right way. So let's quickly go through a list of things you shouldn't do:
- Wrong: Never use the referral exclusion list to filter out traffic; that list has a completely different purpose.
- Wrong: Don't handle spam individually, this is extremely inefficient and will become a nightmare to maintain,
- Wrong: Don't worry about the spam/bots data harming your SEO, GA data is not used for rankings in search results.
- Wrong: Server-side solutions like WordPress plugins or the .htaccess won't have any effect on Ghosts spam because that type of spam never passes through your server.

Now with that out of the way, let's continue with the solution.
First things first. Protect your data from misconfigurations
Before creating any filter in Google Analytics, you should have at least 2 views, one where you will apply the filters and a second one that you will leave unfiltered. This will work as a backup and check the progress of your filters. If you want to be extra cautious you can create a test view to test your filters before applying them.
Here you can find how to create and set best practices for views in Google Analytics.
The 6 filters you need to stop bots and spam in Google Analytics
Once your views are correctly configured, it's time to stop all of that dirty traffic that skews your reports and doesn't let you see the real performance of your site.
It is important to know that there is no one all mighty solution or checkbox that will stop all junk traffic at once, so if you want to have accurate Analytics you will have to work for it.
General notes about filters.
- Do not apply filters on the Raw data view,
- If you are not comfortable working with filters yet, you can use them in a test view first,
- Filters only work forward, to clean historical data you should use this advanced segment.
The Google Analytics filters you will need are:
Want me to do this for you? I can configure all the filters your Analytics needs to ensure you are receiving the most accurate data possible.
a. Filter - Campaign Source to stop crawler referral spam
a. Filter - Campaign Source to stop crawler referral spam
To block crawler spam you'll need a filter with an expression that matches the campaign source of all crawler spam.
To save you some time, I've created a set of optimized regular expressions (REGEX) with all the relevant crawler spam detected over the last years, you'll find them below in the instructions.
Here is how it works (test it yourself):

How to create a filter to block crawler referrer spam in Google Analytics
To block referrer spam in Google Analytics you will need to create an exclude filter using the campaign source:
- Again go to the admin section of your GA.
- On the last column "VIEW", select Filters and then click + Add Filter
- Enter as a name for the filter "Exclude Source - Bots #"
- Configure the filter as follows:
- Filter Type select Custom > Exclude
- Filter Field select Campaign Source (don't use referral field or it won't work)
- Filter Pattern > Paste the following crawler referrer spam expression.
Create 1 filter for each expression
Crawler Expression 1
TOTAL CHARACTERS: 50(traffic|bot|website)-?(bot|traffic|website|4free)Crawler Expression 2
TOTAL CHARACTERS: 249(axcus|dotmass|artstart|dorothea|artpress|matpre|ameblo|freeseo|jimto|seo-tips|hazblog|overblog|squarespace|ronaldblog|c\.g456|zz\.glgoo|harriett|webedu|barbarahome|verabauer|deirdre|ninacecillia|reginanahum|deniseconnie|firstblog|maxinesamson)\.topThese expressions were re-built in February 2021. If you created your filter before then, replace all the old expressions and remove any extra filter.
Get free notifications with the updated expressions whenever I detect new threats.
- After everything is set Save.
You can create an additional filter with the exact same configuration if you find other referrals that are not useful for your Analytics, for example, mobile test sites, project management tools (Basecamp, asana), monitoring services (uptime), or other spam that is not listed.
b. Filter - Valid hostname for ghost spam and development environments
b. Filter - Valid hostname for ghost spam and development environments
Nowadays, ghost spam is less frequent than it used to be a couple of years ago. However, I still recommend having it in place in case a new wave arrives.
Also, this filter will help prevent useless traffic from development/staging sites and scrapers.

Here you will find detailed instructions on how to build a valid hostname filter.
c. Filter - Browser size (not set)
c. Filter - Browser size (not set)
The previous filter used to be great for ghost spam that was sent through the measurement protocol. However, spammers keep getting creative, now some of them crawl sites to grab their hostname and Analytics UA ID, and bypass the hostname filter. In those cases, this filter can help.
Important: If you are using 3rd party tools that send data to your Google Analytics through the Measurement protocol ie. call tracking tools like Callrail, don't use this filter, skip to the next one.
- Create a new filter with the following settings:
- Filter name: Exclude Browser Size - Spam
- Filter configuration:
- Filter type: Custom > Exclude
- Filter field: Browser size
- Filter pattern: enter the following expression as it is: Note: even though you see (not set) on your Google Analytics this value is not added until the hit gets to your GA, so creating filters for (not set) in any dimension won't have any effect. Instead what we will use is a REGEX that means empty like this ^$^$
You can use this same REGEX to filter any other (not set) dimension if you need i.e. Language, Browser version, etc
d. Filter - Language for sneaky crawlers and bots
d. Filter - Language for sneaky crawlers and bots
From time to time you may see weird languages showing in your analytics. I prepared an expression that will prevent any language that doesn't have a proper format like es-ES, en-US, fr-FR, etc.
I also added to the expression the Language "c" which seems to be left by bots too.
- Create a new filter with the following settings:
- Filter name: Exclude Language - Bots
- Filter configuration:
- Filter type: Custom > Exclude
- Filter field: Language Settings
- Filter pattern: enter the following expression as it is:\s[^\s]*\s|.{15,}|\.|,|^c$
e. Filter - Static and dynamic filters for internal traffic
e. Filter - Static and dynamic filters for internal traffic
Not all junk traffic in Analytics comes from outside your company. In fact, a lot can come from within your team: developers, testers, marketers, support, curious employees, etc.
This type of junk traffic is often overlooked and if you don't filter it, it can easily get mixed up with the data of your real visits, and a difference with the spam, this is much harder to identify later.
- Guide to filter static IP,
- Guide to filter internal traffic dynamically.
f. Bonus - Enabling "Exclude all hits from known bots and spiders"
f. Bonus - Enabling "Exclude all hits from known bots and spiders"
This is a pre-built feature that will take care of known bots from the IAB bots and spiders list, it is not perfect but it may help.
How to enable bot filtering
- Again in the Admin section of your Analytics, select your Master view under the VIEW column. (Also for any other filtered view)
- Click View Settings
- Near the bottom check the box Exclude all hits from known bots and spiders (Bot Filtering)
- Save and repeat the process with all your Views
What's next? Clean junk traffic from past data
As you know filters only work forward. To clean spam and bots from your history, you will need to create an advanced segment using this guide:
Additional resources
- Comprehensive spam/bot blacklist.
- Answers to common concerns about spam and bots in Google Analytics:
- Does the spam harm my SEO-Rankings?
- How does it get in your reports?
- and many more.
Wrapping it up
Your Google Analytics is as good as the data it contains. If you don't filter it properly you can end up with inflated reports that don't represent the real performance of your site.
"Even on high volume websites were data spamming would be marginal, you still have to explain why there's such a discrepancy. As an analyst you can't dismiss it simply by saying "nah... we're not too sure what it is..."
The filters and pre-built expressions in this guide will help you keep your Analytics data in good shape, so you can feel confident when you make decisions based on it.
I will be updating this guide as new threats appear so you can keep it as a reference.
Do you have any questions or feedback?
I've tried to cover all the important details in this guide, however, if there is any section where you are experiencing difficulties, please let me know in the comments section below.
If this article helped you consider leaving a comment below with your experience, it may help other people!
Need help setting up reliable and useful Google Analytics for your website/business?
|
|