December 11, 2008
/ by jay.krall
Kingsley Idehen, a co-founder of DBpedia
For public relations professionals, finding mentions about a particular brand or product is getting more challenging as the vast clutter of the Web continues to grow. While paid monitoring services like those offered by Cision and others can help, for those using free-text search engines like Google for media monitoring, combing through pages of irrelevant search results has become routine. For example, acronyms pose a problem: how many instances of the term “HP” referring to “horsepower” do you have to sift through to find articles about Hewlett-Packard products? Plenty.
Worse yet, the longer your queries get, the harder it is for search engines to find what you really want. It’s almost 2009. With all this technological innovation happening so fast, why does it seem like computers still can’t read very well? If they were more literate, the monitoring of media and social media for brand mentions would be a lot easier for everyone.
That’s just one practical argument for the importance of the Semantic Web. First described in 1999 by World Wide Web Consortium director Tim Berners-Lee, the Semantic Web, also referred to as Web 3.0, is often described as a vision for the next generation of the Web: pages that can search each other and pull from each other’s data intelligently, melding Web sites and news feeds into precisely honed, individual Web experiences. But actually, the technologies of the Semantic Web are already hard at work, thanks to a group of computer scientists from around the world who are making Berners-Lee’s vision a reality.
Kingsley Idehen, CEO of OpenLink Software, is one of those pioneers. He is one of the creators of DBpedia, a Semantic Web tool that culls data from Wikipedia in amazingly precise ways. The project is a collaboration of OpenLink Software, the University of Leipzig and Freie University Berlin. Simply put, it divides up the site’s information into tags, and uses those tags to develop searches in which the subject is clearly defined, using a computer language that could soon be applied all across the Web. Beginning in late 2006, a program assigned 274 million tags describing nearly 1 billion facts to catalog Wikipedia in this way using the Resource Description Framework (RDF), a commonly accepted format for Semantic Web applications.
So, let’s say you want to learn about every TV sitcom ever produced that was set in New York City, which is an example query on DBpedia. There is no Wikipedia article devoted to that topic, and if you trying googling “TV sitcoms set in New York City”, you’ll get everything from travel reviews of New York to gossip about sitcom stars. (Google has researched and patented Semantic Web technologies as well.) In a text search for sitcoms set in New York, the crux of your question is so dependent on context that a search engine struggles to figure out what you want. But in DBpedia, using a query language called SPARQL (pronounced “sparkle”) you can define it more specifically: The subject of my search is sitcoms, and New York City is a property of the type of sitcoms I want.
“The whole idea behind DBpedia was to take a live and practical example of what the Web as a database was all about. What we did was take what was emerging as the global general-knowledge base, Wikipedia, and translate that into the format that eventually almost all of the Web data will take down the line,” Idehen says, noting that the project has “rekindled awareness, comprehension and appreciation of the original Semantic Web vision.”
The idea of the Semantic Web may be better described as the “linked data Web”. “If you look at the current Web, you simply have a Web of linked documents. All we’re talking about is, there’s a new level of abstraction that provides us with even more granular linking. There’s linking about the entities that the documents are about,” Idehen says.
“Say I put in a search for my name just as I would in Google. But what happens in this new world is that the page I get back doesn’t say, ‘Here are 99,000 pages.’ It’s going to say, ‘Well, I have a hundred pages about an actor called Kingsley. Here’s another hundred things related to music.’ This is what people have always expected from the Web. But I think for a while people have settled for full-text patterns.”
As software applications that help Web publishers translate their content into RDF tags begin to become prevalent, the boundaries of language begin to melt away. The fact that instances of “HP” could be tagged with the meaning of the acronym is just the beginning. “If you had a Chinese database on Chinese, indigenous, natural medicines, and you were to juxtapose that with, say, Western medicine, you can now mesh, rather than mash, both worlds because beneath both of those realms are core concepts. That’s the kind of thing that the Web is going to facilitate, to allow these erstwhile disparate realms to mesh,” Idehen says.
The implications of the Semantic Web range far and wide, from databases to search engines, Web sites and content feeds. But ultimately, the phenomenon will make it easier to find exactly what you’re looking for.
Get the latest updates on PR, communications and marketing best practices.
Keep up with everything Cision. Check here for the most current product news.
Thought leadership and communications strategy for the C-suite written by the C-suite.
A blog for and about the media featuring trends, tips, tools, media moves and more.
1-312-922-2400from 8 AM - 5 PM CT