Unless the Numbers Reflect People, They’re Just Numbers

For the media industry, the period between March and May is go time. Across the many upfront events that span the media landscape, which are no longer bound to individual platforms and technologies, the expanding content marketplace presents both a wealth of opportunity and an expanse of information for ad buyers and sellers to navigate, especially amid the increasing conversations about big data for measurement. 

For advertisers, numbers are critical this time of year. And as TV consumption fragments amid rising digital engagement, they take on even greater importance. How important? An Ampere Analysis study found that total spend on content in 2021 totaled about $220 billion, led by streaming powerhouse Netflix. And advertisers, knowing that Americans streamed almost 15 million years’ worth of video last year, are rallying, as worldwide digital ad spend surged more than 29% in 2021 to eclipse $491 billion. 

What’s more, consumers have no plans of changing the trajectory of the streaming industry, as 93% of streaming subscribers say they plan to increase their usage over the next year. That doesn’t mean, however, that traditional TV content is out of the picture. Quite the opposite, as the average adult spends more than twice as much time per day with live TV than they do with connected TV (CTV) content.

The increasing abundance of content presents a growing wealth of choice for consumers, but the myriad platforms, devices and services can present measurement challenges for advertisers. Additionally, the explosion of choice has not created more time to engage with content, nor did it create more people. But big data, including that which comes from smart TVs (ACR) and cable boxes (RPD), has a way of suggesting otherwise. The data from cable boxes and smart TVs also provides little insight into streaming activity: Cable boxes, by definition, provide traditional TV data, and ACR often shuts off when audiences use native apps, including Netflix.  

In addition to never being intended to be used for measurement, big data isn’t reflective of actual people. There is no mistaking the value of RPD and ACR, as they provide scale to measurement, but big data is reflective of devices, not of actual people. The data by itself can’t tell you who’s watching and who’s not—which is a fundamental need for advertisers. And when people are removed from the equation, the numbers just won’t add up.

Take ACR data, for example, which identifies images on the screens of smart TVs. This data can be very useful in audience measurement, but by itself, it does nothing more than identify what’s on a screen. RPD data is similar, yet it lacks the ability to even verify that a TV set is on. That’s why one-fourth of all set-top-box impressions come from TVs that aren’t even on.

In addition to not knowing who’s using a device or a screen, big data is inherently biased, and the bias depends on the data type. In order for big data to truly represent the U.S. population, every TV household would need to have the exact same TV set and access programming through the exact same data stream. That’s why all big data sets need to be level set—calibrated—with people-based panels that reflect the diversity of the U.S. population.

Importantly, the World Federation of Advertisers, the Association of National Advertisers and the comparable organizations in over 30 other nations have unanimously stated that the future audience measurement system for screen media must be a combination of quality panel and big data.

Without panel data, measurement doesn’t capture diversity. Not only do we know that all TV households will never access the same content on the same devices, we also know that household makeup is as varied as the fabric of the country that contains the TV households. That’s where big data-based measurement misses the mark—significantly.

For example, Hispanics represent just under 20% of the U.S. population, but big data significantly undercounts this audience, along with many others. But when measurement is based on RPD alone, Nielsen analyses have found that it underrepresents Hispanic homes by 30%. To put that into perspective, consider this: The 2020 U.S. Census determined that the Hispanic population was just over 62 million. If half of that population is watching TV at a given time and advertisers leverage RPD data for measurement, advertisers could be reaching 9 million more people than they’d be aware of.

Importantly, the 30% underrepresentation is an average. At a program level, big data can under- and over-represent by much bigger margins—for both the general population and diverse audiences. For example, a Nielsen study into the variances between big data measurement and its gold standard panel-based measurement found that RPD measurement overstated the total U.S. impressions for a primetime program by 69%. Comparatively, ACR measurement under-stated the total by 12%. For a sporting event, RPD measurement under-stated the Hispanic audience by 47%, while ACR data over-stated the same audience by 12%.

For advertisers, these measurement variances can be costly. The increasing supply of new data sources, however, does add complexity to measurement, especially when it may not be connected to real people. Publishers and advertisers will always want the biggest reach possible, but certainly not without the analytical rigor needed to validate it.

As linear and digital converge, big data sources are critical inputs for measurement. But they’re not trustworthy as measurement sources by themselves. As consumers engage with more devices and more channels, it will be easy to point to data that claims potentially over-inflated engagement. Advertisers would certainly welcome the audience sizes that many alternative audiences suggest, but if they place their ad buys against those numbers, they’ll ultimately be paying for numbers that aren’t reflective of real people.