For audience analysis, it’s still a small data world after all

You’ve all heard it spoken, you might even use it yourself, or at the very least in your marketing copy. I’m talking about the all-encompassing expression du jour: “big data” — those massive quantities of information that require specialized techniques to store, search, and analyze. Though it’s now a classic case of tech locution cancer, it is nonetheless a genuinely thriving and much-discussed area, especially in the context of media audience analysis.

In the last few market research conferences that I’ve attended, there have been speakers boldly state that cross-platform, big data collection will render panels obsolete. If true, this would represent nothing short of a revolution. But were not there yet. In fact, I ague that when it comes specifically to media audience analysis, panel-based data acquisition will always have an important role to play.

The advocates for ”big data only” have a strong argument: If we had reliable cross-platform data for TV, radio, print, online and mobile for the same individuals, we could just passively harvest the information and interrogate the big dataset with any arbitrary query to test various hypotheses. This would be a lot easier than having to manage a plethora of panels, each fragmented by medium and content genre, which are expensive and require constant grooming to maintain responsivity.

Sadly, it’s not that simple. Where is this big data sourced? In the television context, cable providers, internet/mobile service operators and online platforms (such as YouTube, BBC iPlayer, etc.) can measure viewership data. Search engines, cookie tracking and social media can provide the behaviours and sentiments of that same audience. But how can all the data be aggregated when each collection platform has a different owner? What about the silent majority who may not leave any crumbs online for us to harvest? And how can we ever link data that is mostly anonymised?

Some of these are classic big data problems, applicable to any field of study. We need to find ways to combine different data sets to come up with a more universal view. This will only come through data sharing agreements and an acknowledgement that no one can afford to own all the data anymore.

The other problems raised can best be solved… with panels! To classify and link up disparately collected anonymous data we need to build inference engines to profile the audience members. For this, one requires a “gold standard” data set. The larger this “small data” set, containing verified data and observed behaviour from a sample, the more accurate the inference engine will be. Audience panels, whose members are known (age, gender, income, location, etc.), whose viewership is tracked and behaviour is collected (passively or through interrogations/surveys/polls), are the ideal gold standard.