Tuesday, August 30, 2016
White Hat vs Black Hat - Web Analytics and non-human traffic
Publishers and marketers have to deal with all the non-human traffic (bots) on their websites that act a like a real person for the long-term sustainability of the medium.The first evidence of this was the “ Comment Bots” that invaded any website with a comments feature with fake comments or product pitches. It has been estimated that 51.5% of traffic on the Internet are by humans with Good bots at 19.5% and Bad bots at 29% in the 2015 Global Bot Traffic Report, published by Imperva Incapsula, an Internet security company. The reports boast that this is the first time human traffic was higher than bot traffic, which suggests that bot traffic was more than 50% in the past, which is a red flag, especially if your are in the business of selling or buying ad impressions.


A bot is a web robot on the internet that does a specific task that is completed faster than humans can do manually and is done repetitively at computer speed. The Good Bots (White Hats) can be a web spider that comes by and indexes all your content for the search engines or a web scrapper that takes your content and republishes it on their website like shopping comparison sites. It has been estimated by Imperva Incapsula that Google Bots visit a website 187 times a day and account for 60% of all search engine spider traffic.


The Bad Bots (Black Hats) can be one that generates fake traffic and links on websites as part of a click fraud business model or they can be sniffers/spiders looking for security gaps (old code) to loot credit card numbers or email addresses. It appears a lot of this Black Hat activity are scammers in online ad networks and exchanges that have fake websites or legitimate ones that have been hijacked. You can buy millions of bot visitors on the Internet from traffic generating companies disguised as web site testing solutions. Check out this search for “ Traffic Generator Bot,” it was an eye-opener for me.


In the media business all this bot traffic is not good as the expectation is that all web traffic on a site is a real person. You will never get repeat advertising business if the click on your ad was from a computer as it will not generate the expected outcome, which is a sale of your client’s products. How are the web analytics companies dealing with this issue? Google Analytics which most publishers use (because its free), introduced a Bot filter in their analytics software in July, 2014. This filter, which is a checkbox option in the Analytics dashboard, will exclude all hits from IP address on the IAB/ABC International Bots and Spiders List.  This list is available for subscription and includes a White Hat (Good Bots) and Black Hat (Bad Bots) list that can be incorporated into the web analytics software.

comScore, another digital media measurement company, announced in August of the same year that they acquired MdotLabs a software solution that fights against Bad Bots on publishers websites that uses cyber-security methodologies to identify a variety of Black Hat activities including bots, click farms  and traffic generation techniques. This software was incorporated into comScore’s Media Metrix solution. and this solution when used saw a 35% decrease in web traffic. This 35% is almost the same as the 29% Bad Bot traffic estimate stated earlier for validation of the data.

Then, in October 2015, the Media Ratings Council (MRC), an organization for the media measurement industry, released their Invalid Traffic Detection and Filtration Guidelines as part of their mandate to provide standardized industry best practices to root out non-human traffic and to combat ad fraud.  So it seems the controls are in pace in the industry, but as anybody knows in cyber security it is an endless cat and mouse game with no finish line in sight.

Richard Murphy, Sr. VP of Auditing at the BPA suggests the following recommendations for publishers to minimize their exposure to unwanted activity on your website:

1) Know your traffic sources. Most of the invalid traffic comes from sourced or purchased traffic. Organic traffic is generally pretty clean;

2) Know your partners. Use partners you know and trust. There are many links in the online transaction chain and you are only as strong as your weakest link;

3) When possible use certified technology platforms and solutions (MRC, IAB, TAG, CCAB/BPA , AAM). These companies have voluntarily opened up their operations to independent third-party review of industry best practice compliance.

The problem I find with all the web analytics data is I need to sort through all the data that is relevant to achieve the expected marketing outcome of generating sales for my brand and I can sometimes waste my time with irrelevant data. Are Facebook likes and clicks true indicators of campaign performance or should we rely on trusted methods like of managing the response through actual customer contact through a variety of entry points: in-person, on the phone, mail, email, contact forms on the website and social media.

Brian Gillett of Target Audience Management Inc. (TAMI) an audience circulation expert suggests publishers to focus on these metrics - address change requests, email newsletter sign-up, subscriber sign-up and request for more information. All other statistics should be taken as a relative measure as the growing use of privacy software (ad blocking), permission base use of cookies, traffic/click fraud generations techniques and technical flaws are part of the data that goes into the analytics.  Brian also goes on to say that open rates of newsletter is sometimes blocked at the destination due to privacy software or opens are recorded that are not really opens due to technical issues in the tracking software.

Bots are evolving into virtual people now, they no longer do simple repetitive tasks, the tech community is hyping a bot that will have artificial intelligence as part of its make-up. There is a race to create the bot that automates order taking, customer service or conversations through Messaging Bots on smartphones. That’s right the next time you order take-out you will be talking with a Message Bot with artificial intelligence plus machine learning. Facebook in April of this year released a beta version of their Message Bot and now is receiving applications for bot apps for the messaging system. The movement towards a personal one-on-one relationship with customers with mass market scale is happening in the digital world as the technology is coming, but unfortunately it looks like the only real person in this future relationship is you.

- Martin Seto
About Me
Martin Seto

Martin Seto is the producer of the Canadian Online Publishing Awards (COPAS) with 30 years of life expereince in technology, advertising, media and creative exploration. He can be reached at marty(dot)seto(at)
reflexmediasales.com or 416-907-6562, and on LinkedIn.

Most Recent Blog Comment
Lorene Shyba says:
Full of terrific information, Thanks!...
Blog Archive
2024 (1)
2023 (3)
2022 (3)
2021 (1)
2020 (3)
2019 (2)
2018 (6)
2017 (13)
2016 (14)
2015 (12)
2014 (12)
2013 (12)
2012 (12)
2011 (12)
2010 (8)