December 07, 2010
/ by Shawn Rutledge
We live in the age of data see http://blog.visibletechnologies.com/inflection-point-social-media/. Statistics is the language of data and it’s time for marketers to up their game.
A popular tool in social media listening platforms is the “word cloud”. Typically these are generated using the rudimentary method of counting the frequencies of word occurrences in a collection of documents. Of course the fact that words like “the” and “what” occur frequently is useless to marketers. Unfortunately, most products simply suppress common English words and stop there. These simple minded techniques miss the mark on two fronts.
First, words follow a Zipf (power law) distribution. Think of this distribution as a kind of Hydra – every time you chop off the head (remove the most frequent words), a new one grows (a handful of the remaining words will still be much more frequent than the rest).
When studying social mentions of “Google”, it’s hardly surprising that “android” is mentioned frequently. On the other hand, there may be something interesting happening with the “android” brand. The incidence rate is high, but is it higher than what we should expect?
Second, every frequent word naively thrown away comes with missed opportunities. From use in brands (“Bank of America”, “The Home Depot”, “Windows 7”) to personal language (topics where discussion contains “I” much more than expected).
Marketers often think in terms of a brand being over/under indexed to a segment (direct marketing) or brand (affinity marketing). You might hear that Starbucks is over-indexed to women or over-indexed to Apple. These indexes are typically simple ratios (something 2 times more likely may be reported as an index of 200). Again, when applied to the tsunami of social media data, this technique misses the mark.
With a small number of segments or brands, human judgment can decide whether a tiny demographic segment with a large index is worth the effort. But by crunching through the millions of segment and brands discussed in social media we extract thousands of findings with huge indexes but no practical use. You end up wading through thousands of irrelevant correlations (e.g. a small community of 25 consumers that are fans of a t-shirt on Zazzle and are also ten thousand times more likely to be a fan of Starbucks) and never finding the useful ones (e.g. Lady Gaga’s twenty-four million Facebook fans are 1.5 times as likely to be a fan of Starbucks).
We can blame these naïve approaches on early (and poor) tools in a nascent space. Researchers and software vendors bear the bulk of the responsibility to leverage the massive amounts of data now available in practical and accessible ways.
But vendors respond to the market. At least some of the problem comes from marketers who value (and buy) pretty pictures over rigorous analytics, familiar metrics (or worse: “magic” black-box solutions) over the principled and proven. Vendors are told over and over that marketers can’t understand or can’t be bothered to understand basic statistical approaches.
Principled statistical techniques that balance incidence of occurrence with rates of occurrence (like indexes) have existed for over a century. Those and newer techniques have been used in large scale data mining over the last few decades (for example in market-basket analysis). Yes, they may be a little harder to understand than some of these simpler metrics and they can be a little overwhelming at first. But if you are a successful marketer in the age of data you are becoming increasingly statistically literate. My challenge? Demand your tools are too.
Get the latest updates on PR, communications and marketing best practices.
Keep up with everything Cision. Check here for the most current product news.
Thought leadership and communications strategy for the C-suite written by the C-suite.
A blog for and about the media featuring trends, tips, tools, media moves and more.
1-877-297-8912from 8 AM - 5 PM CT