Why is there so many (direct) traffic showing in your Google Analytics?
Finding out why you suddenly see lots of direct traffic sessions on your GA can be a bit tricky due to the lack of source information.
This post will help you determine what is causing this unexpected traffic and most importantly how to clean it from your reports so you can have accurate analytics.
This post focuses only on direct traffic from bots. For spam of any type (referral, keyword, page, etc) follow this other guide.
Possible causes of unexpected direct traffic
I will split the common reasons of spikes in direct traffic into relevant and irrelevant sources.
Relevant direct traffic
This is the one that comes from real users that add valuable data to your Analytics:
- Loyal readers returning to your site; there is nothing wrong with this traffic. Ideally, this would be the only type of direct traffic you should have on your reports. A few examples would be if they bookmarked your page, or they typed your URL directly in the browser.
- Incorrectly tagged campaigns, even if this is relevant traffic, you still need to fix it. If you recently launched a campaign, check that your links are properly tagged and are not missing any UTM parameter. Especially utm_source and utm_medium.
Irrelevant direct traffic
This traffic doesn't add any value and should be excluded from your Analytics.
- Internal traffic improperly filtered, especially if you recently did heavy testing on the site. To avoid this install IP filters for you and your team or better yet block internal traffic dynamically with GTM and cookies.
- Ghost traffic wrongly done, this is rare but sometimes spammers forget how to spam. To stop it simply create a valid hostname filter.
- Referral Exclusion List used for spam, people often mistakenly use this list to filter spam. This list has a totally different purpose, using it for spam will only strip the referral and leave it as a direct visit. To solve this remove all the spam from the list and use a filter instead.
- Bot direct traffic, this the most common scenario and also the most complex to solve, so I will focus the rest of the post to it.
Dealing with Direct Traffic Caused by Bots
The first thing you should do if you haven't done it yet is to enable the option bot filtering in your Google Analytics. This will exclude hits from known bots and spiders in the IAB list.
Unfortunately, most bots are not included in this list, so you will need to exclude the ones that are affecting you manually.
What is a Bot? expand
A bot (aka web crawler, spider or robot) is an automated program or script which browses the internet gathering information. Some of them are beneficial to your site, like the Googlebot, while others are irrelevant.
However, no matter what is the purpose of the bot, the data left by bots in your Analytics is useless and may interfere with your real user's data.
Is this bot direct traffic considered spam?
I see many people on Analytics forums calling this "Direct Spam Traffic." But is it?
To call it "spam", the bot should leave information like an URL with the intention of promoting a service, an idea, or getting something from you, like referral spam. Bot direct traffic doesn't match that, so I wouldn't consider it spamming.
Where do these bots come from?
There are thousands of bots crawling the web for different purposes; there are good and bad bots.
|Good Bots||Bad Bots|
|Search engine spiders||Spammers|
|Statistics sites||Scraping sites|
|Analytics services||Bots used to skew your resources (like DDoS attack)|
|Ads networks||Testing tools|
Here is a great breakdown of bot traffic from Incapsula:
All of bots, no matter if they have good or bad purposes, are totally irrelevant for analysis purposes and should be excluded from your reports in Google Analytics.
In extreme cases like DDoS attacks, you will also need to block them from your server, the hosting services are usually very helpful with this type of stuff.
How to identify bot traffic in GA
Real user and bot traffic can share some characteristics so it is important to narrow down the one that comes only from spiders before filtering or segmenting out this traffic.
Let's start with a quick analysis of the data.
If you are experiencing dozens, hundreds or even thousands of direct visits out of nowhere, with a bounce rate close to 100% and an Avg. Session Time close to 0s, then most probably you are receiving bot traffic.
Common characteristics of bot direct traffic are:
- A sudden spike in direct visits.
- Default Channel Grouping: Direct
- Landing Page: most of the time is your home page usually represented by a backslash / or /index.html
- Bounce Rate is usually really high close to 100%
- Average Session Time is very low: close to 0 seconds
- Page views average 1 per session
Note: Not all data matching these come from bots. So don't go filtering all traffic that matches one or more of these characteristics.
You will need to do a bit more of analysis on your data to detect a specific characteristic that can be safely filtered. If you can't find one you can use a segment to combine multiple conditions.
Most of the time, bots replicate their actions from the same system over and over leaving an identifiable trail, for example, a bot may run in Windows 7, Chrome 43 and Flash version 11.
To find this trail, go to the Direct traffic report on Analytics select the home page (/) and start adding different secondary dimensions to find common patterns. The more you find the better!
Dimensions worth checking:
- Browser/Browser version
- Operative system/ OS versions
- Browser size
- ISP or Network domain
- Flash version
Tip: Open a second window with the same report and select dates where the traffic was normal, then compare the data (browser versions, OS, flash version)
Extended instructions on how to search for patterns of bot traffic.expand
How to Search for Common Characteristics in Bot Traffic
To find some characteristics that will help you exclude this traffic:
- Go to the reporting section of your Google Analytics and select the period were the Direct traffic occurred.
- Expand Acquisition and select Channels
- Then click on Direct and then on the Homepage (usually represented by a slash
- Once there start selecting different Secondary dimensions (at the top of the report)
Here are some characteristics I used from waves of direct traffic I've detected across several of my clients and may coincide with your situation.
- A) July 5, 2015: old flash versions (11.5 r502, 10.0 r183 and 13.0 r0)
- B) January 25, 2016: Chrome 43.0.2357
- C) In March 2016: Service Provider Hubspot
- D) In July 2016: ISPs from data centers
Cleaning Irrelevant Bot Direct Traffic
Once you find 1 or more patterns from the previous step (the more, the better), you can use them to create an advanced segment to exclude this traffic.
Why don't I use a filter instead? Ideally you would want to block this traffic, however, filters allow only one condition whether segments can have multiple. This makes the segments a lot safer to use.
However, if you find a very specific characteristic, that is very unlikely a real user, go ahead and create the filters. For example, very old versions or service providers from data centers or analytics tools.
Otherwise, you better create a segment.
How to Create a Segment to Clean Direct Bot Traffic
To remove bot traffic from Google Analytics:
- Again in the Reporting section of your Google Analytics
- Click on "+ Add Segment" at the top of any of your reports
- Click the red button "+New Segment"
- Almost at the bottom of the window select Conditions. (The first 2 conditions apply to any case, the other conditions will depend on your findings)
- Make sure Exclude is selected and set the conditions. First condition:
- Default Channel Grouping > exactly matches > Direct Click on "AND"
- Second condition:
- Landing Page > exactly matches > / Click on "AND"
- The third will depend on the pattern you found. Using some of the examples I previously mentioned
- a) Old Flash Versions: Flash Version > matches regex > 11\.5\sr502|10\.3\sr183|13\.0\sr0
- b) Hubspot provider: Service Provider > exactly matches > hubspot
- Set a meaningful name for the segment for example "0. All Users - No bots" and Save. All traffic matching these conditions will be excluded from your reports while the segment is selected.
Creating a segment with the 3a condition worked perfectly for some of my clients, it removed most of the unnatural direct traffic (orange).
Need help with this?
If for some reason you are not able to find the source of the unnatural traffic and need a hand, let me know! I can personally review your Analytics.
Your opinion is important
Bots usually crawl multiple sites, and it's possible that other people are having the same issue as you. By sharing your experience and findings, you may help others