Have You Moved Beyond Basic Listening? A Look to Astrology for the Answer
Last month I ran across a fascinating visualization from David McCandless on his Information is Beautiful blog. I also really appreciated his openness and transparency about his entire process which he details here.
In case you are too lazy to click, he and his team crawled a year’s worth of online horoscope pages and analyzed the differences between the vocabularies used for each zodiac sign. He found that the horoscopes for each sign (at least over the course of a year) are all very similar to each other (I will make you click though to see his clever “meta” prediction that gives a generic horoscope for all signs for all days).
One thing I wanted to see improved was the use of simple term frequencies for the analysis. Words like “sure”, “feel”, and “keep” dominate the visual across all signs. He does highlight words that appear only in that sign (within the top 50) in red and these are the more interesting words. By using stop words and this sort of “Venn Diagram” approach (words distinct between top & bottom words sets) he was able to tease out some of the signal from the noise.
I constantly see very similar problems in social media monitoring and analysis. Word clouds based on frequencies really mask the underlying signal. Stop words usually help only a little (in fact they can hurt in some circumstances). I think of naïve words clouds as a litmus test: are you doing basic social monitoring? Or are you moving into social analytics and intelligence?
In our platform we use a measure based on Kullback–Leibler divergence which provides a principled approach to weighing statistical differences (a bit friendlier explanation is here where Facebook’s data team describes their yearly Meme analysis methodology). Let’s see what comes out if we analyze the same data with the Visible Intelligence platform:
Looking at Capricorn (my own sign), we see that stopping did give better results that raw term frequencies by exposing “willing”. But frequencies + stopping didn’t get to the strongest separating word, “plastic” (as in credit card). Stopping failed for Cancer – “family”, while a frequent word, occurred significantly more in the Cancer posts and shouldn’t have been stopped in this case.
It’s also interesting to look at which words occur much less frequently than expected with each zodiac sign (VI, Top 5 Neg). For example, the strongest word for Sagittarius is lack of “emotional”.
Each word also has a measure of strength between 0 and 10 (although I don’t show the strength measures in the table). The strongest word is “impulsive” for Aries with a score of less than 1 out of 10 (0.958). This supports the conclusion that, indeed, the language used around all signs is very similar.
Finally, while they don’t come into play too much here, phrases can be much more informative than individual words. We do see “come along” (Capricorn) and “each other” (Gemini) showing up where the individual words wouldn’t.
So horoscopes are a fun toy example, but what about real business problems? We’ve seen a lot of excitement from our customers about a strong general capability to summarize comparative text differences. They are asking:
- what an author writes about
- for the topics discussed on a site
- for differences between media types (Twitter vs. Blogs)
- differences between geographies (US vs. Canada)
- what changed since the last logged in (time based changes)
- what’s happening right now (emerging and nascent)
- what’s happening during a crisis or event
- what changes due to a marketing campaign
- what people like and don’t like about their brand (sentiment comparisons)
- about positive and negative product traits
- for differences between their brand and their competitor
…in general, summarizing any query or differences between any two queries.
I want to thank David again for the fascinating use case and in the spirit of his transparency, here is a Google doc with the detailed numbers.
Communications Best Practices
Get the latest updates on PR, communications and marketing best practices.
Cision Product News
Keep up with everything Cision. Check here for the most current product news.
Thought leadership and communications strategy for the C-suite written by the C-suite.
A blog for and about the media featuring trends, tips, tools, media moves and more.