To make it easier to follow I split this guide into 4 parts. The first 2 are practical, the last 2 are informative:

Author

Analytics & GTM Developer

Optimizer Troubleshooter

Follow me on
Category | Google Analytics
Difficulty |

Remove spam from your Google Analytics historical data

The spam that is already stored in your Analytics (or any data for that matter) can't be permanently deleted. That is why it is important to also create filters to stop receiving junk traffic.

However, you can still clean your past data affected by spam using a segment.

To help you get started, I created a segment template that you just need to fill in with your "valid hostname expression" and the pre-built expressions I prepared for you.

Clean segment template for Google Analytics

  • Import the template from the Google Analytics gallery. (Don't use Safari browser or it won't work. In general, I recommend using Chrome when working with GA it just works better)
  • Select the view where you want to import the segment and click create,Import Clean Historical Data Segment
  • Fill the placeholders with the pre-built expressions listed below
    You will need to add or remove conditions to the segment depending on the expressions you will use for your analytics.

    • To Remove a condition click on the minus - sign to the right of the correspondent field.
    • To Add click on the OR button of the exclude section then on the first drop-down select the field you want to add and in the second dropdown matches regex and copy the expression on the text box
    Google Analytics clean segment template to remove spam and bots from historical data
  • Once everything is set Save the segment. Whenever you need to analyze "polluted" data, you can select it from your segment list.
a. Standard expressions

a. Standard expressions

1Source
(brateg|budilneg|buketeg|bezlimitko|biteg|boltalko|begalka|alfabot|arendovalka|bank\-rot|abcdefh|aptechko|bukleteg|abc)\.xyz|(magnet\-to\-torrent|torrent\-to\-magnet)\.com|(baixar|descargar)\-musica|wordpress(\-start|\-crew)|uptime(robot|bot|check|\-alpha|\.com)|vitaly|sharebutton|semalt|ranksonic|share\-button|anticrawler|timer4web|free\-video\-tool|responsive\-test|dogsrun|fix\-website\-er|dailyrank|sitevaluation|seo\-2\-0\.|99seo|top10\-way|(videos|buttons)\-for\-your|best\-seo\-(solution|offer)|buttons\-for\-website|profit\.xyz|dbutton|keywords\-monitoring|platezhka|7makemoney|forum69|kings\-analytics|checkpagerank|pr\-cy\.ru|\-\-(production|website|sale)\.com|(audit|dollars|success|top1|amazon|commerce)\-seo|free\-video\-tool|datract|hacĸer|ɢoogl|slifty\.github|\-liar.ru|3\-letter\-|rencer\.ru|foxweber|free\-fbook|goodwriterssales|tourcroatia|spinnerco|justkillingti|suralink|worldtraveler|oldfaithfultaxi|christopherlane|hollywoodweeklymagazine|losangeles\-ads|anniemation|timdreby|pcimforum|yellowstonesafaritours|autoseo|blogarama|for\-placing|brainwizard|casinos4|ḷ\.com|davidsbag|bestonwardticket|presleycollectiblesm|\-backlinks\.com|phoenicx\.co\.uk|be\-escorts|vidyoze|brasseriebread|helvetiiconsulting|johntrapane|cloudsendchef|theautoprofit|:8888|blog1989|incomekey|amazon\-ads\.ovh|krumble\.net|10bestseo|seo\-watch|blog100|seoservices2018|resell\-seo|auto\-?seo|mycheaptraffic|bestbaby\.life|lyfeijiu|yycbtb|tqwh\.net|xtraffic\.plus|xtrafficplus|(christopherblog|tammyblog|billyblog|georgeblog|samanthablog)\.online|(penzu|blogping|blogseo|broderickblog|monicablog)\.xyz|(artblog|howblog|kimberlyblog|seobook|merryblog|axcus|dotmass|artstart|dorothea|artpress|matpre|ameblo|freeseo|jimto|seo-tips|hazblog|overblog|squarespace|ronaldblog|c\.g456|zz\.glgoo|harriett|webedu|barbarahome|annaeydlish|blog2019|compliance-john|compliance-julianna|constanceonline|galblog|greatblog|josephineblog|onlineblog|marketingblog|rosemarie|johnthompson|annierainey|mosesyamtal|candymyers|wikidot|bravenet|daisye|donaldblog|kevblog|livejournal|nancyblog|raymondblog|samlaurabrown|space2019|stylecaster|teresablog|veronicablog|wallinside|verabauer|deirdre|ninacecillia|reginanahum|deniseconnie|firstblog|maxinesamson)\.top|easy-website\-traffic|free\-website\-traffic|(traffic|bot|website)-?(bot|traffic|website|4free)
2Keyword
(internet|bot|traffic)\-?(space|box|art|star|fit|now|traffic|bot|website|4free)|tinyurl\.com|shorturl\.at|cutt\.ly|bit\.ly|rb\.gy
3Language
\s[^\s]*\s|.{15,}|\.|,|^c$
b. Special expressions

b. Special expressions

If you use 3rd party tools that send data to your Google Analytics via Measurement protocol (ie. Callrail), the expressions below need to be tested before using them.

1 Browser Size
(not set)
2Valid Hostnames

Remember the hostname regex is the same one you created for the hostname filter. Here are the instructions to create a valid hostname expression in case you missed it.

This is the only expression that goes in the INCLUDE section of the segment

your-valid-hostname-expression
c. Historic expressions

c. Historic expressions

The following expressions are only needed if you are analyzing data before February 2020.

1Network Domain
paloaltonetworks|scaleway|kcura|^google(\.com$|usercontent\.com|bot\.com)$
2 Page Title
google-liar|whitehouse\.gov|life\.ru|vice\.com|vc\.ru|rencer\.ru|blackhatworld
3Service Provider 1
hubspot|^google\sllc$|^google\sinc\.$|alibaba\.com\sllc|ovh\shosting\sinc\.|microsoft\scorp|facebook\sireland\sltd|online\ssas|evercompliant|early\sregistration\saddresses|inktomi\scorporation|google\scorporate|google\sswitzerland\sgmbh|kazooisyee|cloud69|vultr\sholdings|hos\-329450|internet\ssecurity\s\-|secure\sinternet\sllc|versia\sltd|altushost\ssweden\snetwork|web4africa\s\-ng|altushost\sluxembourg\snetwork|gz\ssystems\slimited\s\-|hostroyale\sportugal|gz\ssystems\slimited\s\-|north\sstar\sinformation\shi\.tech|putian\scity\sfujian
4 Service Provider Special
chinanet\sfujian|putian\scity\sfujian|amazon\.com\sinc\.|amazon\stechnologies\sinc\.|linode\sllc|linode|digitalocean\sllc|amazon\sdata\sservices

Want to get notified of expression updates and new filters?

After saving the segment, you will be able to see spam-free reports, as long as the segment is selected.

Do you have any questions or feedback?

I've tried to cover all the important details in this guide, however, if there is any part where you are experiencing difficulties, please let me know in the comments section below.

If this article helped you, please consider sharing it or leaving a comment below on your experience, it just might help other people! :) Need help implementing, configuring, and/or protecting your Google Analytics? I can help!

If you need help with this or any other Google Analytics configuration/customization.