General Invalid Traffic Filtration Procedures

Prev Next

Triton Digital employs techniques based on identifiers, activity and patterns based on data in the log files in an attempt to identify and filter (exclude) invalid activity, including but not limited to known and suspected non-human activity and suspected invalid human activity. However, user identification and intent cannot always be known or discerned by the publisher, advertiser, or their respective agents, it is unlikely that all invalid activity can be identified and excluded from report results. Details on our techniques are described below.

Invalid or Corrupted Log Data

Sessions or listener tracking pings that do not conform to the required format result in invalid or corrupt data being excluded from reported metrics.  Publishers are provided documentation on the required listener tracking ping and third-party CDN session formats.  It is the publishers’ responsibility to implement these techniques as required for proper data collection.

One Minute Rule

Due to the nature of streaming activity, and the general behavior of robotic/spider related traffic, Triton Digital uses a process whereby streaming sessions with a duration of less than one minute are considered invalid and are removed from all measurement collected data. This rule reduces noise from extremely short sessions, robotic activities, and initial connectivity issues.

This rule applies to both data collection methods. When log files are provided by the CDN, sessions with a duration of less than one minute are not inserted in the database table used by Webcast Metric. When data collection is performed through the listener tracking method, a session is considered active upon the first ping event, which occurs after 60 seconds.

Sessions less than 60 seconds in duration are excluded from both gross and net reported metrics.

Robot Instruction File

Triton Digital uses the Robot Instruction File (robots.txt) in the root directory of the listener tracking and Triton Digital’s Streaming servers.

Specific Identification of Non-Human Activity

Triton Digital uses the IAB/ABCe International Spiders and Bots List provided Spiders and Bots List* in order to exclude site-traffic associated with robotic activity from the collected data. For example, this filtering process allows us to exclude HTTP requests from search engines spiders (Google, Bing, Yahoo, etc.). This list is maintained by the Interactive Advertising Bureau (IAB) and updated monthly.

Additional lists are utilized and updated by Triton Digital to exclude invalid or include known-valid user agents, if those agents are not timely reflected within the IAB/ABCe Internal Spiders & Robots List.

* For more information, refer to: https://www.iab.com/guidelines/iab-abc-international-spiders-bots-list/

Data Center Exclusion

Triton Digital uses the TAG Data Center IP address list in order to exclude industry identified non-human data center traffic. For example, it filters data from Amazon Data Center stream monitoring systems. This list is maintained by the Trustworthy Accountability Group (TAG) and updated monthly.

Activity-based Filtration

Triton Digital employs multiple levels of activity-based detection procedures to exclude data anomalies generated by invalid traffic. Existing invalid traffic detection techniques and data trends are assessed for potential enhancements to our suite of activity-based detection procedures.  

Invalid traffic generated by improper implementations by publishers or potential sources of invalid traffic are discussed with the publisher in effort to remediate the underlying issue and reduce the overall levels of invalid traffic.  

Internally Generated Traffic

Based on IP address, Triton Digital removes internally generated stream session data from measurement collected data. Triton Digital’s staff uses a virtual private network (VPN) which is a computer network that uses the Internet to provide office users with secure access for internal traffic. This VPN IP address is blocked from collection/reporting functions or excluded as invalid traffic. This rule applies to both data collection methods and is performed at the database level. Triton Digital also removes internal traffic generated by participating stations/publishers based on a list of publisher-provided IP address.

Inactivity Rule

Triton institutes a specific “inactivity rule”, by which the session duration is excluded from contributing additional time spent listening in reported metrics after a pre-determine threshold. Sessions with a duration greater than twenty-four hours are truncated at the twenty-four-hour in accordance with Triton Digital’s inactivity rule. The time accumulated prior to this threshold is considered potentially valid for the session. The session is assessed against Triton Digitals suite of invalid traffic detection procedures, in addition to this inactivity rule.

Additional inactivity rules may have been applied by the publisher to continue the digital streaming and measurement of a session once the user confirms continued listening.