http://www.economist.com/displaystory.cfm?story_id=11002939
Link: The semantic web | Start making sense | Economist.com .
"SOME new ideas take wing spontaneously. Others struggle to be born. The “semantic web” is definitely in the latter category. But it may have found its midwife in Reuters, a business-information company.
The semantic web (or “web 3.0”, as some people are trying to re-brand it), is the name given to the idea that the pages of the world wide web ought to carry more than just the meaning they are intended to convey to the human reader. They should also, the thinking goes, be tagged and flagged in ways that machines can make semantic sense of, as people make semantic sense of language. That way, machines could make instant connections that would take serious amounts of time for people to see, or might even elude them altogether.
To this end, the web's übergeeks, the World Wide Web Consortium, have approved all sorts of snazzy acronyms that are supposed to help. The Resource Description Framework (RDF), for example, is supposed to standardise keywords, important dates and so on in a machine-friendly manner. The Web Ontology Language (OWL) will then pick these up and make sense of them. And if those don't work there are hCards, hCalendars, hReviews and other so-called microformat flags that will wave themselves to indicate where to look for various types of data.
It sounds a mess and it is. (((So now let's add computers!)))
As a result it has been hard to persuade those who post web pages to include all the semantic-web stuff in their postings, too. Such marking up, as it is known, goes against the whole spirit of the web, which succeeded where similar ventures failed precisely because it was easy to use.
Reuters, however, believes it has overcome this problem. It recently launched a service called Calais that takes raw web pages (and, indeed, any other form of data) and does the marking up itself. (((Uh-oh.))) The acronyms can then get to work. That promises to imbue the streams of unstructured text and data sloshing around the internet with almost instant meaning.
The idea is that any website can send a jumble of text and code through Calais and receive back a list of “entities” that the system has extracted—mostly people, places and companies—and, even more importantly, their relationships. (((A real boon for assassination squads.))) It will, for instance, be able recognise a pharmaceutical company's name and, on its own initiative, cross-reference that against data on clinical trials for new drugs that are held in government databases. Alternatively, it can chew up a thousand blogs and expose trends that not even the bloggers themselves were aware of.... (((THAT shouldn't be too hard.)))