Regular Expressions (REGEX) in Google Analytics
Regular Expressions are a powerful way of simplifying some tedious tasks in Google Analytics.
What is a Regular Expression (REGEX)?
A REGEX or Regular Expression is a special text that is used as a search pattern. You can think of regular expressions as wildcards.
They can be as simple as this example that will match all files ending in .txt like text.txt or comments.txt
.*\.txt
Or more complex like matching a password with minimum eight characters and at least 1 Letter, 1 Number and 1 Special Character:
^(?=.*[A-Za-z])(?=.*\d)(?=.*[$@$!%*#?&])[A-Za-z\d$@$!%*#?&]{8,}$
Don't worry if it looks like you have your work cut out for you, the most common cases in REGEX in Google Analytics are relatively easy.
Special Characters
Although there are a few differences between the use of REGEX in different applications, the basic rules apply between all of them. So, if you learn how to use regex for GA, you will also be able to use it in other scenarios.
There are some special characters that you can't use in plain text and should be escaped with a backslash \ to be taken as the character.
| | The pipe or bar character is use as OR | ohow|blog matches ohow OR blog |
. | The dot matches any character (letter, number or symbol) | ohow.co matches ohow.co, ohow5co, ohow$co |
^ | To mark how the pattern should Start | ^ohow matches ohow.co but not www.ohow.co |
$ | To mark how the pattern should End | ohow$ matches www.ohow but not ohow.co |
() | Use for grouping | (blog|es).ohow will match blog.ohow.co and es.ohow.co |
\ | Turns special character in normal | www\.ohow\.co |
If you want to use these characters you need to add a backslash \ (this is called escape):
. ^ $ * + - ? ( ) [ ] { } \ |
For example, if you want to match the URL's wwww.ohow.co, you will need to escape the dots to be considered as a dot like this www\.ohow\.co if you don't escape it, the dot will match any character, for example, www5ohow5co
Regular Expressions in Google Analytics
Regular Expression can be used in Filters and Segments in almost any scenario. Some examples of the use of REGEX in Google Analytics: including multiple IP's or a range of IPs in a Filter to Exclude internal traffic.
More recently, another common use to stop spam is an expression that includes all your hostnames.
If you are not familiar with the term, the hostname is the place where the visit arrives on your GA. Let's say you want to include all the following in one expression:
|
|
You can use different approaches for including all of them in one expression:
^www\.ohow\.co$|^ohow\.co$|^blog\.ohow\.co$|^translate\.googleusercontent\.com$|^youtube\.com$|^ohow\.co\.googleweblight\.com$|^paypal\.com$|^webcache\.googleusercontent\.com$
But if you don't need to be that specific, the expression could be simplified by finding some patterns, for example, ohow and googleusercontent. Also, the TLDs can be removed. In that case, the expression can be hugely simplified like this:
ohow|googleusercontent|youtube|paypal
It will all depend on the precision and the way you build the expression.
Tips for REGEX in Google Analytics
- Don't use a bar |, at the beginning or the end of the expression, this will basically mean OR everything else.
- Try to find a good way to match all the requirements, for example, if you want to match blog.ohow.co, es.ohow.co, www.ohow.co, you don't need to add all to the expression simply use ohow.
- If you are using URL's, don't leave spaces.
- The expression is NOT case sensitive unless you check the box.
- REGEX in GA has a limit of 255 characters if your expression exceeds this limit you can first try to optimize it to keep everything under one expression, if it's not possible, you can split it, except for INCLUDE filters you can only have one of those (of each type)
Here, you can find here more information about Regular Expressions in Google Analytics