Precision vs accuracy in digital analytics

in #data3 years ago


Short version:
You will lose a valuable portion of your life worrying about the accuracy of digital analytics data. Accuracy does not exist!
Get over it. Focus on the quality of tagging and implementation. Do your best and then live with your data.
Focus on trends and changes, not absolute numbers. If you rely on absolute numbers e.g. sales, use a different system.

The issue of data quality and accuracy in digital analytics is something that most digital analysts have no option but to learn and internalise very quickly, especially when people start asking why numbers don’t match. However, it is often easy for us to forget that our colleagues don’t live and breathe this data as we do. This post is therefore a reminder of the essential facts about why digital analytics data can’t necessarily be taken as fact.

Why is accuracy a wild goose chase?
Most people first recognise a problem with web analytics data because they are trying to reconcile absolute numbers between two different systems, for example when comparing visits in Google Analytics with clicks as reported by an ad tracking tool. You can waste a very valuable portion of your life trying to figure this out, when in truth you never really will. The following are the key reasons why these numbers don’t match, or why analytics data doesn’t match ‘reality’:

Terminology and Definitions — The terminology and definitions used to calculate metrics usually differs slightly. For example, unique visitors must always be unique visitors within [a certain timeframe]. Different vendors may use different timeframes. Neither is right or wrong; they are just different. This same principle can also apply to lots of other metrics, and sometimes on a much more subtle level.
No Standards — There are no agreed standards to these definitions. There have been attempts in the past to create standards, but they never got off the ground.


Complex web and app technologies — The internet is composed of a huge array of different technologies, which are all constantly evolving and changing. These technologies play a big part in the accuracy of data collection. Tracking methodologies and technologies, such as cookies, packet sniffers and IP addresses all collect data in different ways and all have pros and cons to the way in which they do this. Then there is the development technology itself — right now it’s not uncommon for large portions of a website to be built in a JavaScript Single Page App framework. These frameworks introduce a whole new layer of complexity when it comes to tagging.

Blockers — browser technology increasingly gives savvy users the ability to opt out of being tracked or seeing ads.
Machine traffic — Robots and spiders crawl internet pages in order to e.g. index what is in them for search engines. Data quality in digital analytics is a race to keep up with these creatures!
Implementation and tagging processes — probably the biggest culprit — for most businesses the sheer operational complexity of managing tagging (yes, even with a dynamic tag management system) is incredibly hard. Things change and break. New content goes live untagged. These are simple facts of life.
Etc — the list goes on. The point is that you will never uncover the actual reasons.

Get over it!
The issue of data accuracy can cripple companies and cause vast amounts of wasted time. In truth there is no solution, it is much better to:

Do your best with tagging and tracking, but also realise that it will never be perfect. Don’t make decisions based on the supposed match between data and reality.
Understand the limitations in as much detail as possible and ensure that all recipients of web reporting and analysis are familiar with what the numbers do and don’t tell them.

Focus on trends and segments, and not on absolute numbers. This is easy to do when the focus is on analysis and not pure reporting; insight never comes from pure numbers.
Where numbers such as unique visitors are required for decision making, confidence levels should be used to make reasonable judgements about those numbers.

If we set a consistent base-line of data at the most accurate that we can get it, then we can use this data to make accurate trend assumptions and draw conclusions about time-series analyses.

Coin Marketplace

STEEM 0.68
TRX 0.10
JST 0.076
BTC 58250.79
ETH 4638.80
BNB 627.82
SBD 7.31