The importance of the Google Analytics hostname report
The hostname report is one of those underused reports in Google Analytics that can massively help the accuracy and data quality of your reports.
Some of the things the hostname report can help you with are:
- Identifying and preventing ghost spam (this is a big one),
- Excluding development and staging sites from your main reports,
- Identifying sites that are scrapping your content.
What is a hostname in Google Analytics?
A hostname is any domain, tool, or service where your GA tracking code is present. They might be controlled by you or by an external service. An example of a hostname controlled by you is your website where you inserted your code. An example of a hostname not controlled by you is Google translate.
Hostname vs a source expand
These 2 are often confused:
- The Source is where your visit comes from and there can be any number of them, for example, Facebook, Google, Twitter, Youtube, links from other sites to your site, etc.
- The hostname, on the other hand, is the site where the visitor arrives. Your main hostname will be your domain and, and depending on the configuration of your site, there may be others.
Where is the hostname report in GA?
One of the reasons why this report is usually overlooked is because it isn't located on the main list of reports, if you type hostname on the search box it won't appear.
To find the hostname report:
- Go to the reporting section, select a year or more on the calendar, then select the Audience reports in the sidebar.
- Expand Technology and click on Network
- At the top of the report (just below the graph) select Hostname as a primary dimension (by default Service Provider is selected)
Here you will see the list of the hostnames, real or fake (spam). The most important one is your domain, the rest will vary from site to site depending on the size, age, and configuration of your site.
Types of hostnames
Some of the hostnames you may find in your Analytics are:
|IPs||localhost (might also be from scraping site)|
|Tools connected to your Analytics||Youtube, Mailchimp, etc,|
|Translate services||Google translate|
|Cache services||Bing cache|
|Speed services||Google Weblight|
|Scraping sites||Sites that copy pages and post them exactly as they are|
|Spam||They may show the spammer URL or the URL of a known site to try to fool you like mashable.com, google.com or apple.com|
|(not set) hostname||Traffic coming from spam or a code issue (More information below)|
The following screenshot shows some of the hostnames stored in the analytics property of this site.
You may think well all that is interesting, but how can I use it to improve the quality of my Analytics?
You can use this information to create a filter to allow only traffic to the hostnames you consider relevant, that way any traffic that has an invalid hostname and doesn't add any value to your Analytics will be left out.
So now that you know how to find and identify your valid hostnames make a list of all the ones you want to include in your filter. Following the example above, these are the hostnames I consider valid.
|My valid hostnames|
Building a hostname expression for your filter
You can only create one hostname filter so you need to fit all your hostnames in the same regular expression. The simplest way to do it is just pasting one after another adding a pipe character "|" like this
However, you can simplify it a lot more. GA uses REGEX (a special text string for finding patterns) for custom filters, so you don't have to match exactly each hostname, a partial match will be enough. So the expression above can be shortened like this:
Of course, if you have development environment like in my case "staging1.carloseo.com" the above expression would match that hostname. So here is where you will have to get a bit crafty by using a more advanced regex.
The parenthesis and question mark are special characters, that basically says it has to start with "services" or with "carloseo", any other subdomain won't be counted.
Here are some basic tips to help you build your expression:
- To match exactly your hostnames you should add a caret ^ at the beginning and a dollar sign $ at the end of each hostname like this carloseo.com$|^services.carloseo.com$|^www.youtube.com$
- To separate each hostname, you should use a bar or pipe character |, this works as OR, if you can´t find it, hold Alt + 124(Numeric pad)
- The dot . and the hyphen - are considered special characters in REGEX so normally you would add a backslash \ , however in most cases is not needed for this type of filter.
- Try to find a good way to match as many hostnames as you can, for example, if you want to match blog.carloseo.com, es.carloseo.com, www.carloseo.com, you don't need to add all of them to the expression entering carloseo, will be enough to match all of them.
- Domains don't spaces so don't leave any in your expression.
- IMPORTANT! The REGEX in GA has a limit of 255 characters if your expression exceeds this limit try to optimize it to keep everything under one expression because you can only have 1 Include hostname filter.
- IMPORTANT! Don't add a pipe/bar |, at the beginning or the end of the expression.
On this post, you can find more about Regular Expressions
How to test your hostname expression
It's important that you add all your relevant hostnames, or you will lose valid data, so to make sure your filter will work as expected you can test it using one of the following:
- Using a quick segment in GA, this will let you see live how you filter will behave directly on your reports.
- Using a regex test tool like regex101.com, here is an example using the latest expression I created.
Do you need help building your hostname expression? I can help you
Creating a filter to include only valid hostnames
You are almost there! All this read and work will be soon rewarded.
Once you have your expression fully tested it time to create the "include hostname filter" that will help you get rid of all the toxic traffic that skews your reports.
How to create a valid hostname filter for ghost spam and dev sites
On your Google Analytics:
- Go to the Admin tab, and select the view where you want to apply the filter. If you follow the naming above, this will be the Master view or Test view.
- Select Filters under the View column, and select + Add Filter
- Enter as a name for the filter Include Valid Hostnames.
- Configure the filter as follows:
- Filter TypeCustom > Include
- Filter FieldHostname
- In the Filter Pattern box copy the hostname expression that you built before.
- [optional] You can click on Verify this filter for a quick glance of how the filter will work. You should only see spam or irrelevant hostnames on the left side of the preview table.
If you get this message: "This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small"
It is probably because of the limited data used by this featureTry verifying it with a quick segment (if you haven't done it yet).
- After making sure your filter is ok, Save the filter.
IMPORTANT: This filter doesn't require regular updates, but it's essential to update the expression whenever you add the tracking ID to new service, tool, or domain.
Wrapping it up
The hostname report can greatly help you to increase the quality of your data. With the information given there, you can create a solid filter that will only allow valuable data to pass.
Depending on the configuration and size of your site it might be less or more difficult to configure the filter, however, the results are worth the time invested on preparing one of the most important filters you could add to your analytics.
What else can I do to improve my Google Analytics data
Adding the hostname filter will have a great impact on your Analytics data. Here are other guides that can help you even more:
- Filtering internal traffic; the static way or dynamically,
- Filtering google analtyics spam and bots,
- Consolidating facebook referrals.
Do you have any questions or feedback?
I've tried to cover all the important information in this guide, however, if there is any part of the guide where you had difficulties, please let me know in the comments section below.
If this article helped you, consider sharing it or leaving a comment below on your experience, it may help other people!