F.A.Q.: Google Analytics Spam
I've been following the Google Analytics spam issue closely for a long time, and after 3 years I'm still finding articles, with partial solutions or even some that do more harm than good.
In this post, I answer some of the most common questions and concerns I received in the past years from Analytics users suffering from this.
So let's begin.
Is Google doing something to handle this threat?
This is one of the most common questions I get from my readers "Is the GA team doing something to stop it?"
The truth is that they are constantly fighting the spam, believe me, it could be a lot worse if they weren't doing something to control it.
Is it enough? It is hard to tell, the problem is that every time Google implements a measure the spammers find other ways to get through.
Remember that Google Analytics is a free service (if you are not one of the few using the premium service) and it is used by millions of sites, this has two consequences, (1) it is extremely attractive to spammers, and (2) any change in the service core has a huge impact on all users. So they have to do it right and not disrupt or interfere with other functionalities.
How to efficiently filter the Spam in Google Analytics?
The spam has a fairly simple solution, the problem is that there are many partial solutions that don't tackle the spam efficiently. So the user has to keep making updates.
To avoid this, I built a detailed guide with the most efficient solutions against the spam in Google Analytics. These solutions have been proven to work for almost 3 years. Here are some examples of sites that used these solutions:
You can go to the Ultimate Guide for Getting Rid of the Spam in Analytics. If you don't have the time to do it yourself I can personally help you.
What is Referrer Spam?
The referrer or referral spam is a fake URL sent to Google analytics in order to attract people to that URL and promote their service or product, and in some cases, to inject malware by asking you to insert a code in your site.
Originally the spammers were sending fake data only as referrals, but now you can find it everywhere in your reports like keywords (organic), pages, events or even as a language.
What are the most common types of spam?
The most active types of spam are:
- Crawler Spam - this was the "original" spam, it uses bots to leave fake referrals.
- Ghost Spam - this is the most common and aggressive. You can find it almost anywhere in your reports and it only affects Google Analytics, it never passes through your site.
* Bot Direct Traffic - This is not technically spam but spiders (aka bots) compiling information. Some are good and some are not.
Each one of them has different characteristics, therefore, different ways of dealing with them.
What is Crawler Referrer Spam?
Crawler spam is a spider, programmed to navigate through sites to leave fake referrals in Analytics and logs of the site. These URLs will attract the users to the spammer site when searching for information.
Examples: All semalt.com variations, uptime-alpha.net
- These crawlers will usually ignore all rules like robots.txt
- Crawler spam is far less frequent than Ghost spam since it requires more resources from the spammer.
- Although they can be blocked using server solutions (htaccess, web.config, WordPress plugins), it is recommended to use filters in GA, since the number of hits is very low.
Note: A common mistake when filtering this type of spam is using the Referral field on the filter, instead you should use the Campaign Source, otherwise, the filter won't work.
What is Ghost Spam?
One difference between crawler spam and ghost spam is that Ghost spam NEVER accesses your site. Ghost spam is essentially fake data, sent directly to Google Analytics servers. This type of spam only needs an existing tracking code number to hit, it doesn't matter if it is or not inserted on an active site. Some of its key characteristics are:
- No matter if you use WordPress, Joomla, Shopify or any other Content Manager System (CMS), the only way to stop ghost spam in Google Analytics is with filters.
- Server-side solutions like WordPress plugins, the htaccess file or the web.config are useless.
- You can find ghost spam in almost any report: Referral, Organic, Direct, Language, and Events.
This type of spam is preferred by the spammers and is currently the most used.
If it never visits my site. How does Ghost Spam hit my Analytics?
They use the Analytics Measurement Protocol to reach your Analytics directly without passing through your Site. This Protocol is intended to allow developers to send data directly to Google Analytics Servers to measure how users interact with their business from almost any environment.
Contrary to what some people think the spammer doesn't get your tracking ID from your page, this would require crawling your site, which would require more work and effort.
What they most likely do is generate random codes with the GA pattern (UA-XXXXXX-Y) in combination with an automated script that sends the fake data to thousands of Properties.
Are you sure Ghost Spam never accesses my site? (demonstration)
I sometimes get emails asking how is it possible to get traffic in GA, if they never pass through your site.
So I decided to make a small demonstration that shows how it hits your Reports directly and why server solutions won't work. I took a segment of a Google Analytics Report with all Referrer Spam (crawler and ghost) that hit my site in March 2015.
I used AWStats to analyze the access log of my site (same month) and looked for the name of all the Spam on the previous list.
As you can see only semalt.com and buttons-for-website.com (marked in red) which are crawlers, are logged. The rest (marked in blue) are all Ghost Spam, and there is no trace whatsoever of them in the local access log.
What is the valid hostname filter and why is so important?
The valid hostname filter is the best solution against spam in Google Analytics. There are four huge advantages to using this filter:
- It's preventive, unlike the campaign source filter.
- Little maintenance is required since only one filter does all the work.
- It will stop any form of ghost spam whether it shows as a referral, organic, event, or direct visit.
- Will help you keep away other irrelevant traffic, such as hostnames that you use for testing.
- Since it doesn't know the site, all data will be faked including the hostname.
How does the "valid hostname filter" work?
All ghost spam (the most common and obnoxious type of spam) leaves a fake hostname in your reports. By creating a filter that includes only valid hostnames you will automatically leave it out.
What is a hostname in Google Analytics?
A hostname is every place where one of your visits arrives. It will mainly be your domain but it could also be services where you added your GA tracking code. Every visit in your Google Analytics will have a source and a hostname.
- Source: the place where the visit originates (i.e. referral, organic, direct, social).
- Hostname: the place where the visit arrives. In most cases, this is your domain.
To make it clearer, I will give you an example:
If we consider a visit that comes from Facebook to this article:
Facebook >> www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/
The visit will be recorded in Google Analytics like this:
- Hostname: www.ohow.co
- Source (referral): facebook.com
Why Should You Use Campaign Source Instead of Referral?
It may seem reasonable to use the Referral field on the filters to try to exclude Referrals, however, that is not how you should do it.
Instead, you should use Campaign Source. But why?
First of all, it is in the Analytics documentation for filtering referrals, no matter if they are spam or not.
To expand more on why, usually, a valid visit will come with a valid value for the HTTP header Referrer. However, that's not the case for most of the spam and even some real sources.
When spammers send spam they add the source and medium by using UTM parameters, so they will appear as referrals but they don't have the information for the HTTP header Referrer. That is why a filter with the Referral field won't work.
On many occasions, the filter may seem like it is working when using the referral field because some of the spam has a short lifespan of weeks or even days. So the spam just stops coming but it isn't blocked
Does the Spam harm my SEO/SERP?
The short answer is NO, at least not directly. If we consider that the spam corrupts your data and may cloud your decisions then it can affect in some part your SEO.
However, if we talk about the data left by the spammer, like the bounce rate or the Avg. Session time then you shouldn't worry about it, Google has officially stated that they don't use Google Analytics data as a ranking factor in any way, and John Mueller, Webmaster Analyst at Google, recently confirmed it:
If you think about it, using Google Analytics data for rankings won't make sense for two main reasons.
- First, although GA is widely used, not every website uses it, so it wouldn't be a fair benchmark.
- Second, the data in GA can be easily manipulated in many forms, and people could fake the Bounce rate, for example, if you insert the code multiple times in your site the bounce rate will be close to 0.
That said, we have to look at the other aspects; SEO is not only related to your Rankings, but it is also related to the analysis of your data to make better decisions and improve your search ranking (SERP).
In that case, the answer is Yes, the spam and any other irrelevant traffic that lowers the accuracy of your reports, affects your SEO, because the data you are analyzing is polluted, and it may mislead you in those decisions.
Does the Spam Represent a Security Issue?
No, as long as you don't insert any script from the spammer website.
Sometimes ghost spam leaves weird pages on Google Analytics, and people think that the website was hacked in some way. But as you already know it is all fake, injected by the spammer directly into your GA reports.
Just make sure it is spam and not real pages that are injected on your website somehow. If you can open the page on your site then you might have been hacked.
How did the spammer target my analytics?
The truth is that they don't pick the analytics, they just target random tracking codes in the form of UA-000000-1, yours just happened to be on the list.
What is the purpose of the Spam?
You may wonder how they benefit from this. People are naturally curious, and they want to know what is going on with their websites, so they visit the URL of the referral without knowing it is fake. The surprise comes when they find that there is no mention of their site at all.
The spammers hit thousands of Google Analytics properties so you can imagine the amount of traffic they are getting with this blackhat technique.
While the common purpose is to lure people to visit the fake referral, the final objective changes:
- Promote a page
- Get your email
- Sell you a service
- Try to make you insert a script on your site (case of free-share-button)
- Redirect you to an online store where they get a commission through an affiliate program. A common store used is aliexpress.com
How to detect Spammy traffic?
Not all unusual traffic is spam, so before filtering it, you should do a little research.
First, you can check if the odd visit is on this list that is constantly updated.
If you can't find it there, then try searching for it, but don't type the URL directly in the browser, or you will be redirected to the spammer site, instead search it like this suspicioussite.com / referral
If you still can't find information about it, you can analyze the data left by the spam. Ghost Spam is easier to spot considering all the data is fake. Just check the hostname of the referral. If it is (not set) or some weird name that doesn't belong to you then it's spam, below I have highlighted some examples in red.
Crawlers (orange), on the other hand, are harder to detect because they do leave real data. You can try using a combination of the following characteristics to find if it is spam:
- Landing Page and Page Title: Homepage
- Bounce rate: either close to 0% or close to 100%
- Avg. Session Time: close to 0 seconds.
Can I use the Referral Exclusion List for Spam?
No! This is one of the most common mistakes. The purpose of this feature is to exclude real referrals, so they don’t trigger a new session, like avoiding payment gateways from being counted as a referral.
Adding spam on this list will only strip the referral part and will leave it as a direct visit instead, which is even worse since it is harder to detect and filter later.
When is it OK to use the referral exclusion list?
Third-party payment processors
If you use third-party payment processors like Paypal, Shopify, etc., consider adding them to the Exclusion List.
When you use the same tracking code across your subdomains, you should add your domain to the Exclusion List
Why You Can't Use Server Solutions (WordPress plugins, .htaccess) for Ghost Spam?
If you read how ghost spam hits your analytics then you know now that it never passes through your site. So trying to block it with server-side solutions like the plugins, the .htaccess file or the web.config file won't do any good.
In the worst-case scenario, (I've seen this a lot), it will shut down your site completely because these files are very sensitive, and just one misplaced character could cause a lot of trouble.
You can block crawler spam with this method, however, the amount of traffic generated by it is very low. My recommendation is to use filters for both types of spam.
Why am I getting "This filter would not have changed your data..."?
There are 2 common reasons, first, your filter is not correctly configured and the second is related to the data used by the Filter verification feature. This tool uses only a sample of your data so if it doesn't find a match in the sampling you will get this message.
If you are sure you configured the filter correctly just ignore the message. If you still want to double-check, you can use an advanced segment to test your filter.
Are there other ways of preventing Spam?
I highly recommend filtering the spam from Google Analytics, even the crawlers. However, if you would prefer to use other methods you could try these:
Blocking the spam from your server (ONLY Crawlers)
You can use configuration files and rules to block the spam from your server. Just be aware that this will only work with a small portion of the spam. The crawlers and ghost spam never visit your site so it's not possible to block it from there.
Changing your tracking ID number (Only new accounts)
This method doesn't exactly Block Spam, but it makes your Google Analytics less attractive to them. It is a good option for new Websites. Since the Spam usually targets codes ending in -1, if you change your Google Analytics tracking ID to a higher number UA-XXXXXXX-3 some of them won't reach you.
To do this, just create a new property under the Admin section of your Analytics.
Is there a Spam List?
You can find here a comprehensive list of the Spam that hit Google Analytics over the last couple of years.
Do you have any other questions?
I tried to cover the most common questions I get about this issue. If you have any other questions or if something is unclear, leave a comment and I will do my best to help!
Excellent resources that helped build this guide.
Thanks to Ben from Viget and Nick from cucumber.co for the help in building this article